Not sure if it''s been posted yet, my email is currently down... http://weblog.infoworld.com/yager/archives/2007/10/suns_zfs_is_clo.html Interesting piece. This is the second post from Yager that shows solaris in a pretty good light. I particularly like his closing comment: "If you haven''t checked out ZFS yet, do, because it will eventually become ubiquitously implemented in IT. It is too brilliant not to be." Francois This message posted from opensolaris.org
On 24-Oct-07, at 3:24 PM, Francois Dion wrote:> Not sure if it''s been posted yet, my email is currently down... > > http://weblog.infoworld.com/yager/archives/2007/10/ > suns_zfs_is_clo.html > > Interesting piece. This is the second post from Yager that shows > solaris in a pretty good light. I particularly like his closing > comment: > > "If you haven''t checked out ZFS yet, do, because it will eventually > become ubiquitously implemented in IT. It is too brilliant not to be."If he can get so excited about the ''pool'' concept alone, what happens when he tells us about the rest? :) --Toby> > Francois > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Having gotten a bit tired of the level of ZFS hype floating around these days (especially that which Jonathan has chosen to associate with his spin surrounding the fracas with NetApp), I chose to respond to that article yesterday. I did attempt to be fair and would appreciate feedback if anything I said was not (since I would not wish to repeat it elsewhere and would be happy to correct it there). - bill This message posted from opensolaris.org
Hello can, Monday, November 5, 2007, 4:42:14 AM, you wrote: cyg> Having gotten a bit tired of the level of ZFS hype floating cyg> around these days (especially that which Jonathan has chosen to cyg> associate with his spin surrounding the fracas with NetApp), I cyg> chose to respond to that article yesterday. I did attempt to be cyg> fair and would appreciate feedback if anything I said was not cyg> (since I would not wish to repeat it elsewhere and would be happy to correct it there). Bill - I have a very strong impression that for whatever reason you''re trying really hard to fight ZFS. Are you NetApp employee? :) Journaling vs ZFS - well, I''ve been managing some rather large environment and having fsck (even with journaling) from time to time which takes 24-50 hours for a file system........ after migrated to ZFS no more such issues (there was a bug in ZFS which caused long time booting after hard reboot in some cases but it''s been fixed for some time now). The same happens on ext2/3 - from time to time you''ve got to run fsck. ZFS end-to-end checksumming - well, you definitely underestimate it. While I have yet to see any checksum error reported by ZFS on Symmetrix arrays or FC/SAS arrays with some other "cheap" HW I''ve seen many of them (which explained need for fsck from time to time). Then check this list for other reports on checksum errors from people running on home x86 equipment. Then you''re complaining that ZFS isn''t novel... well comparing to other products easy of management and rich of features, all in one, is a good enough reason for some environments. While WAFL offers checksumming its done differently which does offer less protection than what ZFS does. Then you''ve got built-in compression which is not only about reducing disk usage but also improving performance (real case here in a production). Then integration with NFS,iSCSI,CIFS (soon), getting rid of /etc/vfstab, snapshots/clones and sending incremental snapshots. Ability to send incremental snapshots - that actually changes a game in some environments. If you got a lot of cheap storage with a lot of small files and you want to keep it cheap the problem is you basically have no means to back it up.... with zfs send suddenly you can. Which means you are able to keep it cheap and have your backups (to remote storage). Then... ok, enough :) -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On 7-Nov-07, at 9:32 AM, Robert Milkowski wrote:> Hello can, > > Monday, November 5, 2007, 4:42:14 AM, you wrote: > > cyg> Having gotten a bit tired of the level of ZFS hype floating > cyg> around these days (especially that which Jonathan has chosen to > cyg> associate with his spin surrounding the fracas with NetApp), I > cyg> chose to respond to that article yesterday. I did attempt to be > cyg> fair and would appreciate feedback if anything I said was not > cyg> (since I would not wish to repeat it elsewhere and would be > happy to correct it there). > > Bill - I have a very strong impression that for whatever reason you''re > trying really hard to fight ZFS. Are you NetApp employee? :) > ... > > ZFS end-to-end checksumming - well, you definitely underestimate it.Everyone has their favourite feature, this is mine; but there''s also that pesky fact that nobody and nothing else can do this. --Toby> > > Then... ok, enough :) > > -- > Best regards, > Robert Milkowski mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Monday, November 5, 2007, 4:42:14 AM, you wrote: > > cyg> Having gotten a bit tired of the level of ZFS > hype floating > cyg> around these days (especially that which > Jonathan has chosen to > cyg> associate with his spin surrounding the fracas > with NetApp)...> Bill - I have a very strong impression that for > whatever reason you''re > trying really hard to fight ZFS.That impression is incorrect, but probably understandable in someone with an obvious bias in the opposite direction. Are you NetApp> employee? :)Nope - no connection whatsoever, ever. Perhaps you should consider taking what I said above (and included here so that you can read it again if you skimmed it the first time) at face value.> > Journaling vs ZFS - well, I''ve been managing some > rather large > environment and having fsck (even with journaling) > from time to time''From time to time'' suggests at least several occurrences: just how many were there? What led you to think that doing an fsck was necessary? What did journaling fail to handle? What journaling file system were you using? ...> The same happens on ext2/3 - from time to time you''ve > got to run fsck.Of course using ext2 sometimes requires fscking, but please be specific about when ext3 does.> > ZFS end-to-end checksumming - well, you definitely > underestimate it.Au contraire: I estimate its worth quite accurately from the undetected error rates reported in the CERN "Data Integrity" paper published last April (first hit if you Google ''cern "data integrity"'').> While I have yet to see any checksum error reported > by ZFS on > Symmetrix arrays or FC/SAS arrays with some other > "cheap" HW I''ve seen > many of themWhile one can never properly diagnose anecdotal issues off the cuff in a Web forum, given CERN''s experience you should probably check your configuration very thoroughly for things like marginal connections: unless you''re dealing with a far larger data set than CERN was, you shouldn''t have seen ''many'' checksum errors. (which explained need for fsck from time> to time).Since fsck does not detect errors in user data (unless you''re talking about *detectable* errors due to ''bit rot'' which a full surface scan could discover, the incidence of which is just not very high in disks run within spec), and since user data comprises the vast majority of disk data in most installations, something sounds a bit strange here. Are you saying that you ran fsck after noticing some otherwise-undetected error in your user data? If so, did fsck find anything additional wrong when you ran it? In any event, finding and fixing the hardware that is likely to be producing errors at the levels you suggest should be a high priority even if ZFS helps you discover the need for this in the first place (other kinds of checks could also help discover such problems, but ZFS does make it easy and provides an additional level of protection until their underlying causes have been corrected).> Then check this list for other reports on checksum > errors from people > running on home x86 equipment.Such anecdotal information (especially from enthusiasts) is of limited value, I''m afraid, especially when compared with a more quantitative study like CERN''s. Then again, many home systems may be put together less carefully than CERN''s (or for that matter my own) are.> > Then you''re complaining that ZFS isn''t novel...When you paraphrase people and don''t choose to quote them directly, it''s a good idea at least to point to the material that you''re purportedly representing - keeps you honest, even if you *think* you''re being honest already. I certainly don''t ever recall saying anything like that, so I''ll ask you for that reference. I *have* suggested that *some* portions of ZFS are not as novel as Sun (perhaps I should have been more specific and said "Jonathan", since it''s his recent spin in such areas that I find particularly offensive) seems to be suggesting that they are. well> comparing to other > products easy of management and rich of features, all > in one, is a > good enough reason for some environments.Then why can''t those over-hyping ZFS limit themselves to that kind of (entirely reasonable) assessment? While WAFL> offers > checksumming its done differently which does offer > less protection > than what ZFS does.I''m afraid that you just don''t know what you''re talking about, Robert - and IIRC I''ve corrected you on this elsewhere, so you have no excuse for repeating your misconceptions now. WAFL provides not one but two checksum mechanisms which separate the checksums and their updates from the data that they protect and hence should offer every bit as much protection as ZFS''s checksums do. Then you''ve got built-in> compression which is not > only about reducing disk usage but also improving > performance (real > case here in a production).Are you seriously suggesting that compression qualifies as a ''novel'' feature in a file system? Or did you just kind of lump it into your paragraph which began apparently as a rebuttal to a comment I doubt I ever made in the first place?> > Then integration with NFS,iSCSI,CIFS (soon), getting > rid of > /etc/vfstab, snapshots/clones and sending incremental > snapshots.Can you say "WAFL clone"?> > Ability to send incremental snapshots - that actually > changes a game > in some environments.Save for those using WAFL, of course - or, for that matter, any system that supports snapshots to which incremental backup utilities may be applied. I wouldn''t keep harping on this if the underlying subject here weren''t how ''novel'' ZFS is... The funny thing is, I *like* ZFS, by and large. I had been roughing out the design of a write-anywhere-when-it-makes-sense file system with in-parent checksum protection and enhanced metadata redundancy before I ever heard of ZFS, and when I did hear about it I was impressed that a major corporation had had the initiative to fund development in what I consider to be a neglected area. I do consider the RAID-Z design to be somewhat brain-damaged, and still believe that using a transaction log makes more sense (when its presence is thoroughly leveraged) than the full-tree-path write-back approach that ZFS uses (it''s especially ironic that they wound up needing a log *anyway*) - though this has significant implications for how one approaches snapshots and CDP, and don''t believe they use disk pools as flexibly as they could, and was disappointed that they didn''t build something more easily extensible to a distributed approach, but still think that ZFS is the kind of ''measurable stride forward'' in storage that I recently characterized it as being. However, I hadn''t realized (with respect to ZFS or to my own design) just how much in this area NetApp had implemented first. I (and quite possibly the ZFS implementors as well) approached the problem not *through* WAFL but via an independent path that happened to lead to a rather similar destination. I thought of WAFL''s ''write anywhere'' policy as applying primarily to RAID-4 stripes, since that''s how it usually tries to aggregate data in NVRAM, whereas mine (and ZFS''s) felt more like old-style ''shadow paging'', in my case as modified by use of a transaction log. And I had no idea that WAFL had implemented separate checksum protection. Not that I have any opinion one way or the other when it comes to patent enforceability: unlike a lot of people, I understand just how specialized the qualifications required to make an informed assessment in that area are. The older I get, the more disturbing it is to see just how unobjective most people really are. It''s bad enough in politics, but I used to think that technical types were at least somewhat better at viewing things analytically - probably because technical merit so consistently trumped bias and bluster during my early experiences at DEC (though looking back on that now I can recall an occasional possible exception even there). In some ways I see engineers as being one of the few remaining checks on marketing excesses, much as the court system is supposed to be the ultimate check on political excesses. I value those checks enough that I''m willing to devote some effort to keeping them working properly. - bill This message posted from opensolaris.org
On 11/7/07, can you guess? <billtodd at metrocast.net> wrote:> > Monday, November 5, 2007, 4:42:14 AM, you wrote: > > > > cyg> Having gotten a bit tired of the level of ZFS > > hype floatingI think a personal comment might help here ... I spend a large part of my life doing system administration, and like most Solaris 2.5.1/2.6/2.7/8/9 administrators that time was spend on medium to large SPARC systems ... at companies who could afford it (in my case a financial institution). With that came all of the things you expect from really big companies .. rigorous change control, pain in the ass (borderline neurotic) management ... clever as can be database administrators, sub par backups systems forever taking too long and all sorts of third party storage technologies, each with their own issues. Since applications and data drive the world, you need decent tools to get the best from it. So, Veritias and storage vendors like EMC became hero''s overnight in 100% of the organizations I knew off, including mine. My last (active) administration days was spend armed with Solaris 9, ufs with logging, Veritas Volume "Dam"ager ;) and HDS/Symmetrix kit... which was rock solid, fast enough and with a decent amount of braincells not too much of a problem to maintain (in a large SAN environment). Why change ...? Its all good... budget''s never come under pressure.. right? Expensive RISC systems would never be replaced by cheap commodity parts... Hang on, you tell me I can pop in Solaris 10, slap in ZFS ... reduce most of my storage footprint to JBOD''s ... (and all of this on a little old AMD system.).. You must be joking! Why would I consider a new solution that is safe, fast enough, stable .. easier to manage and lots cheaper? (That''s my fanboy hat, please excuse) If administrators consider the above and generate "hype" .. well, then "hype" away. I believe the proof is in the pudding, so I will personally hold out some more, until I see (work on) a fully operational ZFS replacement of the "old school" configurations. (For me that only becomes a reality when the jbod kit is released next year.) But I do believe that some of the "hype" is justified ... maybe not for a Linux administrator running his (apache/php/mysql) company from a HP380 with a smartarray controller ... but sure as hell for an old retired administrator from Solaris environments of yesteryear. PS: I don''t think being sucked into a spec sheet battle between various file systems makes any sense ... it''s about using technology to give your business an edge ... and zfs (alone) could have changed my old environment dramatically. Thats why I believe people should allow for just a little bit of ... "hype" :)
On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you guess? wrote:> I do consider the RAID-Z design to be somewhat brain-damaged [...]How so? In my opinion, it seems like a cure for the brain damage of RAID-5. Adam -- Adam Leventhal, FishWorks http://blogs.sun.com/ahl
Economics for one. We run a number of testing environments which mimic the production one. But we don''t want to spend $750,000 on EMC storage each time when something costing $200,000 will do the job we need. At the moment we have over 100TB on four SE6140s and we''re very happy with the solution. ZFS is saving a lot of money for us because it enables solutions that weren''t viable before.> Hang on, you tell me I can pop in Solaris 10, slap in ZFS ... reduce > most of my storage footprint to JBOD''s ... (and all of this on a > little old AMD system.).. You must be joking! > > Why would I consider a new solution that is safe, fast enough, stable > .. easier to manage and lots cheaper? (That''s my fanboy hat, please > excuse) >
On 11/8/07, Mark Ashley <mark at ibiblio.org> wrote:> Economics for one.Yep, for sure ... it was a rhetoric question ;)> > Why would I consider a new solution that is safe, fast enough, stable > > .. easier to manage and lots cheaper?Rephrase, "Why would I NOT consider ...?" :)
> Au contraire: I estimate its worth quite accurately from the undetected error rates reported in the CERN "Data Integrity" paper published last April (first hit if you Google ''cern "data integrity"''). > > > While I have yet to see any checksum error reported > > by ZFS on > > Symmetrix arrays or FC/SAS arrays with some other > > "cheap" HW I''ve seen > > many of them > > While one can never properly diagnose anecdotal issues off the cuff in a Web forum, given CERN''s experience you should probably check your configuration very thoroughly for things like marginal connections: unless you''re dealing with a far larger data set than CERN was, you shouldn''t have seen ''many'' checksum errors.Well single bit error rates may be rare in normal operation hard drives, but from a systems perspective, data can be corrupted anywhere between disk and CPU. I know you''re not interested in anecdotal evidence, but I had a box that was randomly corrupting blocks during DMA. The errors showed up when doing a ZFS scrub and I caught the problem in time. Without a checksummed filesystem, I would most likely only have discovered the problem when an important fs metadata block was corrupted. At this point there would be serious silent damage to user data. Checksumming is a safety belt that confirms that the system is working as it was designed, and lets user process know that the data they put down is the same as the data they get back, which can only be a good thinng. Like others have said for big business; as a consumer I can reasonably comforably buy off the shelf cheap controllers and disks, and know that should any part of the system be flaky enough to cause data corruption the software layer will catch it which both saves money and creates peace of mind. James
Hello can,>> >> Journaling vs ZFS - well, I''ve been managing some >> rather large >> environment and having fsck (even with journaling) >> from time to timecyg> ''From time to time'' suggests at least several occurrences: just cyg> how many were there? What led you to think that doing an fsck cyg> was necessary? What did journaling fail to handle? What cyg> journaling file system were you using? From time to time means several times a year in entire environment. UFS+ with journaling were used. Why fsck was necessary? Well, file system during normal operation started to complain about some inodes, remounted itself to RO (or locked) and printed in logs: please run fsck. Then you''ve got about 20-30 hours to wait, then sometimes after fsck finished it asked you that you have to re-run fsck... We''ve migrated to ZFS and all these problems are gone, except some checksum errors.>> The same happens on ext2/3 - from time to time you''ve >> got to run fsck.cyg> Of course using ext2 sometimes requires fscking, but please be specific about when ext3 does. You reboot and it asks you for fsck - rarely but still. Also use google and you''ll find other users in similar situation (forced to use fsck on ext3).>> ZFS end-to-end checksumming - well, you definitely >> underestimate it.cyg> Au contraire: I estimate its worth quite accurately from the cyg> undetected error rates reported in the CERN "Data Integrity" cyg> paper published last April (first hit if you Google ''cern "data integrity"'').>> While I have yet to see any checksum error reported >> by ZFS on >> Symmetrix arrays or FC/SAS arrays with some other >> "cheap" HW I''ve seen >> many of themcyg> While one can never properly diagnose anecdotal issues off the cyg> cuff in a Web forum, given CERN''s experience you should probably cyg> check your configuration very thoroughly for things like marginal cyg> connections: unless you''re dealing with a far larger data set cyg> than CERN was, you shouldn''t have seen ''many'' checksum errors. Maybe I shouldn''t get checksum errors but I do. Then sometimes it''s a HW/firmware problem - check with EMC Clariion on SATA disks - there was a bug which caused data corruption (array just said that some sectors were lost but RAID still continued to work). Then there was (probably still is) a problem on IBM''s FastT arrays with SATA disks which is causing data corruption. Then Sun''s 3511 array with SATA disks in RAID-5 had also a bug which casued data corruption... just go to vendors bug databases and look for data corruption and you would be surprised (I was). Even with "simple" PCI cards you will find bugs causing silent data corruption (I like one with IBM''s ServeRAID card which occured only if you got 8+GB of memory in a host). Then we''ve got quite a lot of storage on x4500 too, and so far no single checksum error detected by ZFS - so you''re right it''s not only about disk but rather about entire solution (disks, controllers, firmware on all levels, connections, switches, HBAs, drivers, ...). The point is that ZFS gives you really good protection in all these cases and it has already pay off for me. cyg> Since fsck does not detect errors in user data (unless you''re cyg> talking about *detectable* errors due to ''bit rot'' which a full cyg> surface scan could discover, the incidence of which is just not cyg> very high in disks run within spec), and since user data cyg> comprises the vast majority of disk data in most installations, cyg> something sounds a bit strange here. Are you saying that you ran cyg> fsck after noticing some otherwise-undetected error in your user cyg> data? If so, did fsck find anything additional wrong when you ran it? I know it doesn''t check user data - we only run fsck when we had to due to fs being remounted RO or locked with an advice to run fsck. Sometimes some inodes were fixed sometimes fsck didn''t detect anything. Nevertheless to get fs working we had to fsck. Also very rarely we have actually found some files with corrupted content - as we''ve developed all applications we thought the problem was with a bug in our code - once migrated to zfs no single occurance of bad files so probably it wasn''t our code after all. Some problems were related to firmware bugs on IBM''s and other arrays, some maybe due to other reasons. cyg> In any event, finding and fixing the hardware that is likely to cyg> be producing errors at the levels you suggest should be a high cyg> priority even if ZFS helps you discover the need for this in the cyg> first place (other kinds of checks could also help discover such cyg> problems, but ZFS does make it easy and provides an additional cyg> level of protection until their underlying causes have been corrected). I agree. The point is that before ZFS we weren''t even sure where is the problem and main suspects were applications. Once we started moving to ZFS the main suspect was ZFS badly reporting. Then it turned out there''s a bug in IBM''s firmware... then in EMC (fixed), then still we got some errors - unfortunatelly after changing FC cables, GBICs, etc. the problem is still there from time to time for whatever reason. Fortunately thanks to ZFS the problem is not propagated to application.>> Then check this list for other reports on checksum >> errors from people >> running on home x86 equipment.cyg> Such anecdotal information (especially from enthusiasts) is of cyg> limited value, I''m afraid, especially when compared with a more cyg> quantitative study like CERN''s. Then again, many home systems cyg> may be put together less carefully than CERN''s (or for that matter my own) are. Then you probably got more homogenic environment in CERN. As I wrote - we do have lot of storage on x4500 with no single checksum error so far. So it''s about entire solution and your experience may vary from environment to environment. The point again is ZFS makes your life better if you are unlucky one.>> Then you''re complaining that ZFS isn''t novel...cyg> When you paraphrase people and don''t choose to quote them cyg> directly, it''s a good idea at least to point to the material that cyg> you''re purportedly representing - keeps you honest, even if you cyg> *think* you''re being honest already. cyg> I certainly don''t ever recall saying anything like that, so I''ll cyg> ask you for that reference. I *have* suggested that *some* cyg> portions of ZFS are not as novel as Sun (perhaps I should have cyg> been more specific and said "Jonathan", since it''s his recent cyg> spin in such areas that I find particularly offensive) seems to be suggesting that they are. http://weblog.infoworld.com/yager/archives/2007/10/suns_zfs_is_clo.html "So while ZFS really isn''t all that ''close to perfect'', nor as entirely novel as Sun might have one believe [...]" I''m sorry you''re right. You write it differently. cyg> well>> comparing to other >> products easy of management and rich of features, all >> in one, is a >> good enough reason for some environments.cyg> Then why can''t those over-hyping ZFS limit themselves to that cyg> kind of (entirely reasonable) assessment? Well, because if you want to win as we all know it''s not only about technology. You can provide the best technology in a market but without proper marketing and hype you probably will loose...>> While WAFL >> offers >> checksumming its done differently which does offer >> less protection >> than what ZFS does.cyg> I''m afraid that you just don''t know what you''re talking about, cyg> Robert - and IIRC I''ve corrected you on this elsewhere, so you cyg> have no excuse for repeating your misconceptions now. I haven''t spotted your correction. cyg> WAFL provides not one but two checksum mechanisms which separate cyg> the checksums and their updates from the data that they protect cyg> and hence should offer every bit as much protection as ZFS''s checksums do. Can you point to any document? Well, below link (to NetApp page) says that checksum and datablock are in the same block. So it''s nothing like ZFS. Unless they changed something and haven''t updated that page. Again - can you point to some documentation. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html "Brace yourself, because we saved the most insidious disk problem for last. With extreme rarity, a disk malfunction occurs in which a write operation fails but the disk is unable to detect the write failure and signals a successful write status. This event is called a "lost write," and it causes silent data corruption if no detection and correction mechanism is in place. You might think that checksums and RAID will protect you against this type of failure, but that isn''t the case. Checksums are written in the block metadata?coresident with the block?during the same I/O. In this failure mode, neither the block nor the checksum gets written, so what you see on disk is the previous data that was written to that block location with a valid checksum. Only NetApp, with its innovative WAFL (Write Anywhere File Layout) storage virtualization technology closely integrated with RAID, identifies this failure. WAFL never rewrites a block to the same location. If a block is changed, it is written to a new location, and the old block is freed. The identity of a block changes each time it is written. WAFL stores the identity of each block in the block''s metadata and cross checks the identity on each read to ensure that the block being read belongs to the file and has the correct offset. If not, the data is recreated using RAID. The check doesn''t have any performance impact. " When it comes to NetApp - they are really great. However thanks to ZFS I can basically get the same and more than what NetApp offers with spending much less money at the same time. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> On 11/7/07, can you guess? <billtodd at metrocast.net> > wrote: > > > Monday, November 5, 2007, 4:42:14 AM, you wrote: > > > > > > cyg> Having gotten a bit tired of the level of > ZFS > > > hype floating...> But I do believe that some of the "hype" is justifiedJust to make it clear, so do I: it''s the *unjustified* hype that I''ve objected to (as my comments on the Yager article should have made clear). I believe that ZFS will, for at least some installations and workloads and when it has achieved the requisite level of reliability (both actual and perceived), allow some people to replace the kind of expensive equipment that you describe with commodity gear - and make managing the installation easier in the process. That, in my opinion, is its greatest strength; almost everything else is by comparison down in the noise level. However, ZFS is not the *only* open-source approach which may allow that to happen, so the real question becomes just how it compares with equally inexpensive current and potential alternatives (and that would make for an interesting discussion that I''m not sure I have time to initiate tonight). - bill This message posted from opensolaris.org
> On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you > guess? wrote: > > I do consider the RAID-Z design to be somewhat > brain-damaged [...] > > How so? In my opinion, it seems like a cure for the > brain damage of RAID-5.Nope. A decent RAID-5 hardware implementation has no ''write hole'' to worry about, and one can make a software implementation similarly robust with some effort (e.g., by using a transaction log to protect the data-plus-parity double-update or by using COW mechanisms like ZFS''s in a more intelligent manner). The part of RAID-Z that''s brain-damaged is its concurrent-small-to-medium-sized-access performance (at least up to request sizes equal to the largest block size that ZFS supports, and arguably somewhat beyond that): while conventional RAID-5 can satisfy N+1 small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in parallel (though the latter also take an extra rev to complete), RAID-Z can satisfy only one small-to-medium access request at a time (well, plus a smidge for read accesses if it doesn''t verity the parity) - effectively providing RAID-3-style performance. The easiest way to fix ZFS''s deficiency in this area would probably be to map each group of N blocks in a file as a stripe with its own parity - which would have the added benefit of removing any need to handle parity groups at the disk level (this would, incidentally, not be a bad idea to use for mirroring as well, if my impression is correct that there''s a remnant of LVM-style internal management there). While this wouldn''t allow use of parity RAID for very small files, in most installations they really don''t occupy much space compared to that used by large files so this should not constitute a significant drawback. - bill This message posted from opensolaris.org
> > Au contraire: I estimate its worth quite > accurately from the undetected error rates reported > in the CERN "Data Integrity" paper published last > April (first hit if you Google ''cern "data > integrity"''). > > > > > While I have yet to see any checksum error > reported > > > by ZFS on > > > Symmetrix arrays or FC/SAS arrays with some other > > > "cheap" HW I''ve seen > > > many of them > > > > While one can never properly diagnose anecdotal > issues off the cuff in a Web forum, given CERN''s > experience you should probably check your > configuration very thoroughly for things like > marginal connections: unless you''re dealing with a > far larger data set than CERN was, you shouldn''t have > seen ''many'' checksum errors. > > Well single bit error rates may be rare in normal > operation hard > drives, but from a systems perspective, data can be > corrupted anywhere > between disk and CPU.The CERN study found that such errors (if they found any at all, which they couldn''t really be sure of) were far less common than the manufacturer''s spec for plain old detectable but unrecoverable bit errors or to the one hardware problem that they discovered (a disk firmware bug that appeared related to the unusual demands and perhaps negligent error reporting of their RAID controller and caused errors at a rate about an order of magnitude higher than the nominal spec for detectable but unrecoverable errors). This suggests that in a ZFS-style installation without a hardware RAID controller they would have experienced at worst a bit error about every 10^14 bits or 12 TB (the manufacturer''s spec rate for detectable but unrecoverable errors) - though some studies suggest that the actual incidence of ''bit rot'' is considerably lower than such specs. Furthermore, simply scrubbing the disk in the background (as I believe some open-source LVMs are starting to do and for that matter some disks are starting to do themselves) would catch virtually all such errors in a manner that would allow a conventional RAID to correct them, leaving a residue of something more like one error per PB that ZFS could catch better than anyone else save WAFL. I know you''re not interested> in anecdotal > evidence,It''s less that I''m not interested in it than that I don''t find it very convincing when actual quantitative evidence is available that doesn''t seem to support its importance. I know very well that things like lost and wild writes occur, as well as the kind of otherwise undetected bus errors that you describe, but the available evidence seems to suggest that they occur in such small numbers that catching them is of at most secondary importance compared to many other issues. All other things being equal, I''d certainly pick a file system that could do so, but when other things are *not* equal I don''t think it would be a compelling attraction. but I had a box that was randomly> corrupting blocks during > DMA. The errors showed up when doing a ZFS scrub and > I caught the > problem in time.Yup - that''s exactly the kind of error that ZFS and WAFL do a perhaps uniquely good job of catching. Of course, buggy hardware can cause errors that trash your data in RAM beyond any hope of detection by ZFS, but (again, other things being equal) I agree that the more ways you have to detect them, the better. That said, it would be interesting to know who made this buggy hardware. ...> Like others have said for big business; as a consumer > I can reasonably > comforably buy off the shelf cheap controllers and > disks, and know > that should any part of the system be flaky enough to > cause data > corruption the software layer will catch it which > both saves money and > creates peace of mind.CERN was using relatively cheap disks and found that they were more than adequate (at least for any normal consumer use) without that additional level of protection: the incidence of errors, even including the firmware errors which presumably would not have occurred in a normal consumer installation lacking hardware RAID, was on the order of 1 per TB - and given that it''s really, really difficult for a consumer to come anywhere near that much data without most of it being video files (which just laugh and keep playing when they discover small errors) that''s pretty much tantamount to saying that consumers would encounter no *noticeable* errors at all. Your position is similar to that of an audiophile enthused about a measurable but marginal increase in music quality and trying to convince the hoi polloi that no other system will do: while other audiophiles may agree with you, most people just won''t consider it important - and in fact won''t even be able to distinguish it at all. - bill This message posted from opensolaris.org
can you guess? wrote:> CERN was using relatively cheap disks and found that they were more > than adequate (at least for any normal consumer use) without that > additional level of protection: the incidence of errors, even > including the firmware errors which presumably would not have occurred > in a normal consumer installation lacking hardware RAID, was on the > order of 1 per TB - and given that it''s really, really difficult for a > consumer to come anywhere near that much data without most of it being > video files (which just laugh and keep playing when they discover > small errors) that''s pretty much tantamount to saying that consumers > would encounter no *noticeable* errors at all.bull* -- richard
Thanks for the detailed reply, Robert. A significant part of it seems to be suggesting that high-end array hardware from multiple vendors may be *introducing* error sources that studies like CERN''s (and Google''s, and CMU''s) never encountered (based, as they were, on low-end hardware). If so, then at least a major part of your improved experience is not due to using ZFS per se but to getting rid of the high-end equipment and using more reliable commodity parts: a remarkable thought - I wonder if anyone has ever done that kind of a study. A quick Google of ext3 fsck did not yield obvious examples of why people needed to run fsck on ext3, though it did remind me that by default ext3 runs fsck just for the hell of it every N (20?) mounts - could that have been part of what you were seeing? There are two problems with over-hyping a product: it gives competitors something legitimate to refute, and it leaves the impression that the product has to be over-sold because it doesn''t have enough *real* merits to stand on. Well, for people like me there''s a third problem: we just don''t like spin. When a product has legitimate strengths that set it out out from the pack, it seems a shame to over-sell it the same way that a mediocre product is sold and waste the opportunity to take the high ground that it actually does own. I corrected your misunderstanding about WAFL''s separate checksums in my October 26th response to you in http://storagemojo.com/2007/10/25/sun-fires-back-at-netapp/ - though in that response I made a reference to something that I seem to have said somewhere (I have no idea where) other than in that thread. In any event, one NetApp paper detailing their use is 3356.pdf (first hit if you Google "Introduction to Data ONTAP 7G") - search for ''checksum'' and read about block and zone checksums in locations separate from the data that they protect. As just acknowledged above, I occasionally recall something incorrectly. I now believe that the mechanisms described there were put in place more to allow use of disks with standard 512-byte sector sizes than specifically to separate the checksums from the data, and that while thus separating the checksums may achieve a result similar to ZFS''s in-parent checksums the quote that you provided may indicate the primary mechanism that WAFL uses to validate its data: whether the ''checksums'' reside with the data or elsewhere, I now remember reading (I found the note that I made years ago, but it didn''t provide a specific reference and I just spent an hour searching NetApp''s Web site for it without success) that the in-block (or near-to-block) ''checksums'' include not only file identity and offset information but a block generation number (I think this is what the author meant by the ''identity'' of the block) that increments each time the block is updated, and that this generation number is kept in the metadata block that points to the file block, thus allowing the metadata block to verify with a high degree of certainty that the target block is indeed not only the right file block, containing the right data, but the right *version* of that block. As I said, thanks (again) for the detailed response, - bill This message posted from opensolaris.org
Let''s stop feeding the troll... -----Original Message----- From: zfs-discuss-bounces at opensolaris.org on behalf of Richard Elling Sent: Thu 11/8/2007 11:45 PM To: can you guess? Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Yager on ZFS can you guess? wrote:> CERN was using relatively cheap disks and found that they were more > than adequate (at least for any normal consumer use) without that > additional level of protection: the incidence of errors, even > including the firmware errors which presumably would not have occurred > in a normal consumer installation lacking hardware RAID, was on the > order of 1 per TB - and given that it''s really, really difficult for a > consumer to come anywhere near that much data without most of it being > video files (which just laugh and keep playing when they discover > small errors) that''s pretty much tantamount to saying that consumers > would encounter no *noticeable* errors at all.bull* -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071109/a34609e5/attachment.html>
> bull* > -- richardHmmm. Was that "bull*" as in "Numbers? We don''t need no stinking numbers! We''re so cool that we work for a guy who thinks he''s Steve Jobs!" or "Silly engineer! Can''t you see that I''ve got my rakish Marketing hat on? Backwards!" or "I jes got back from an early start on my weekend an you better [hic] watch what you say, buddy, if you [Hic] don''t want to get a gallon of [HIC] slightly-used beer and nachos all over your [HIC!] shoes [HIIICCCKKKK!!!] oh, sh- [BLARRGHHHHHHHH]" Inquiring minds want to know. - bill This message posted from opensolaris.org
can you guess? wrote:> CERN was using relatively cheap disks and found that they were more than adequate (at least for any normal consumer use) without that additional level of protection: the incidence of errors, even including the firmware errors which presumably would not have occurred in a normal consumer installation lacking hardware RAID, was on the order of 1 per TB - and given that it''s really, really difficult for a consumer to come anywhere near that much data without most of it being video files (which just laugh and keep playing when they discover small errors) that''s pretty much tantamount to saying that consumers would encounter no *noticeable* errors at all. >I haven''t played with bit errors in video. A bit error in a JPEG generally corrupts everything after that point. And it''s pretty easy for people to have a TB or so of image files of various sorts. Furthermore, I''m interested in archiving those for at least the rest of my life. Because I''m in touch with a number of professional photographers, who have far more pictures than I do, I think of 1TB as a level a lot of people are using in a non-IT context, with no professional sysadmin involved in maintaining or designing their storage schemes. I think all of these are good reasons why people *do* care about errors at the levels you mention. One of my photographer friends found a bad cable in one of his computers that was upping his error rate by an order of magnitude (to 10^-13 I think). Having ZFS would have made this less dangerous, and detected it more quickly. Generally, I think you underestimate the amount of data some people have, and how much they care about it. I can''t imagine this will decrease significantly over the next decade, either. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Most video formats are designed to handle errors--they''ll drop a frame or two, but they''ll resync quickly. So, depending on the size of the error, there may be a visible glitch, but it''ll keep working. Interestingly enough, this applies to a lot of MPEG-derived formats as well, like MP3. I had a couple bad copies of MP3s that I tried to listen to on my computer a few weeks ago (podcasts copied via bluetooth off of my phone, apparently with no error checking), and it made the story hard to follow when a few seconds would disappear out of the middle, but it didn''t destroy the file. Scott On 11/9/07, David Dyer-Bennet <dd-b at dd-b.net> wrote:> can you guess? wrote: > > > CERN was using relatively cheap disks and found that they were more than adequate (at least for any normal consumer use) without that additional level of protection: the incidence of errors, even including the firmware errors which presumably would not have occurred in a normal consumer installation lacking hardware RAID, was on the order of 1 per TB - and given that it''s really, really difficult for a consumer to come anywhere near that much data without most of it being video files (which just laugh and keep playing when they discover small errors) that''s pretty much tantamount to saying that consumers would encounter no *noticeable* errors at all. > > > > I haven''t played with bit errors in video. A bit error in a JPEG > generally corrupts everything after that point. And it''s pretty easy > for people to have a TB or so of image files of various sorts. > Furthermore, I''m interested in archiving those for at least the rest of > my life. > > Because I''m in touch with a number of professional photographers, who > have far more pictures than I do, I think of 1TB as a level a lot of > people are using in a non-IT context, with no professional sysadmin > involved in maintaining or designing their storage schemes. > > I think all of these are good reasons why people *do* care about errors > at the levels you mention. > > One of my photographer friends found a bad cable in one of his computers > that was upping his error rate by an order of magnitude (to 10^-13 I > think). Having ZFS would have made this less dangerous, and detected it > more quickly. > > Generally, I think you underestimate the amount of data some people > have, and how much they care about it. I can''t imagine this will > decrease significantly over the next decade, either. > > -- > David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ > Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ > Photos: http://dd-b.net/photography/gallery/ > Dragaera: http://dragaera.info > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hello can, Friday, November 9, 2007, 8:16:12 AM, you wrote: cyg> If so, then at least a major part of your improved experience is cyg> not due to using ZFS per se but to getting rid of the high-end cyg> equipment and using more reliable commodity parts: a remarkable cyg> thought - I wonder if anyone has ever done that kind of a study. I wouldn''s say High-End - as I said I have yet to see checksum error reported by ZFS on SAS/FC arrays. However on midrange arrays with sata disks it happened many times. Just a quick look to IBM page for example: http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-64357&brandind=5000008 "85750 (85456) Data corruption after large RAID 5 Volume Group with 2 LUNs Fix 92372 (92293) Media Scan with Redundancy Check caused data corruption (2) 89814 IO device error or data corruption on DS4100 " http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=MIGR-55696 "269911 85753 85456 Data corruption after large VolGrp with 2 luns 269921 80084 79485 79298 Data corruption occurred after spindown of GHS drive during copy back followed shortly by reset of the controller. 269923 80468 80110 Data corruption reported during Failover and DSS 269926 80756 78907 Data corruption while running I/O to flashcopy on fastt600 where the LBA in the head and tail did match as expected 79260 (RW #78101) (CL #76791) Fixed data corruption problem (1) 79485 (CL #79298) Fix data corruption occurred after spindown of GHS 79704 (CL #79212) TD_PT3574 Fix data corruption during tripple fault test " And you can find more and more... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> A quick Google of ext3 fsck did not yield obvious examples of why people needed to run fsck on ext3, though it did remind me that by default ext3 runs fsck just for the hell of it every N (20?) mounts - could that have been part of what you were seeing?I''m not sure if that''s what Robert meant, but that''s been my experience with ext3. In fact that little behavior caused a rather lengthy bit of downtime on another company in our same colo facility this week as a result of a facility required reboot. Frankly, ext3 is an abortion of a filesystem. I''m somewhat surprised its being used as a counterexample of journaling filesystems being no less reliable than ZFS. XFS or ReiserFS are both better examples than ext3. The primary use case for end-to-end checksumming in our environment has been exonerating the storage path when data corruption occurs. Its been crucial in a couple of instances in proving to our DB vendor that the corruption was caused by their code and not the OS, drivers, HBA, FC network, array etc. Best Regards, Jason
On Fri, Nov 09, 2007 at 12:11:48 -0700, Jason J. W. Williams wrote: : I''m somewhat surprised its being used as : a counterexample of journaling filesystems being no less reliable than : ZFS. XFS or ReiserFS are both better examples than ext3. I tend to use XFS on my Linux boxes because of it. ReiserFS I consider dangerous: if merely having an image of a ReiserFS filesystem on a ReiserFS filesystem is enough for fsck to screw everything up, it doesn''t pass my ''good gods, just *what* were they *thinking*'' test. I''d use XFS or JFS over the others, any day. ZFS would be lovely. Pity about the licence issues. -- Dickon Hood Due to digital rights management, my .sig is temporarily unavailable. Normal service will be resumed as soon as possible. We apologise for the inconvenience in the meantime. No virus was found in this outgoing message as I didn''t bother looking.
Dickon Hood <dickon-ml at fluff.org> wrote:> ZFS would be lovely. Pity about the licence issues.There is no license issue: the CDDL allows a combination with any other license and the GPL does not forbid a GPL project to use code under other licenses in case that the non-GPL code does not become a derived work of the the GPL code. In case of a filesystem, I do not see why the filesystem could be a derived work from e.g. Linux. If people did like, they could use ZFS in Linux and nobody would complian...... The problem is in the first priority, politics then technical problems. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Fri, Nov 09, 2007 at 21:34:35 +0100, Joerg Schilling wrote: : Dickon Hood <dickon-ml at fluff.org> wrote: : > ZFS would be lovely. Pity about the licence issues. : There is no license issue: the CDDL allows a combination : with any other license and the GPL does not forbid a GPL : project to use code under other licenses in case that the : non-GPL code does not become a derived work of the the GPL : code. I happen to agree with you, but unfortunately those in charge of the kernel don''t. Licence politics are annoying, but still fall under ''licence issues'' in my book. : In case of a filesystem, I do not see why the filesystem could : be a derived work from e.g. Linux. Indeed not, however AIUI the FSF do. : If people did like, they could use ZFS in Linux and nobody would : complian...... I can''t see why it isn''t possible to maintain an out-of-tree implementation -- after all, the issues with mixing GPL and non-GPL code only come about on redistribution -- but I don''t see anyone doing this. I''d give it a bash myself, but I have time issues at the moment, and as my knowledge of kernel internals (of any Unixoid) is rather lacking at the moment, would involve quite a learning curve. Pity. : The problem is in the first priority, politics then technical problems. Agreed. -- Dickon Hood Due to digital rights management, my .sig is temporarily unavailable. Normal service will be resumed as soon as possible. We apologise for the inconvenience in the meantime. No virus was found in this outgoing message as I didn''t bother looking.
Dickon Hood <dickon-ml at fluff.org> wrote:> On Fri, Nov 09, 2007 at 21:34:35 +0100, Joerg Schilling wrote: > : Dickon Hood <dickon-ml at fluff.org> wrote: > > : > ZFS would be lovely. Pity about the licence issues. > > : There is no license issue: the CDDL allows a combination > : with any other license and the GPL does not forbid a GPL > : project to use code under other licenses in case that the > : non-GPL code does not become a derived work of the the GPL > : code. > > I happen to agree with you, but unfortunately those in charge of the > kernel don''t. > > Licence politics are annoying, but still fall under ''licence issues'' in my > book.We cannot change other people. We can try to read with them and may have luck. I am happy with the fact that during the past 6 months the unfriendly expressions against OpenSolaris have become very rare compared to 2-3 years ago. Maybe there is hope in a not too far future. FreeBSD and Mac OS X create facts now. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> Most video formats are designed to handle > errors--they''ll drop a frame > or two, but they''ll resync quickly. So, depending on > the size of the > error, there may be a visible glitch, but it''ll keep > working.Actually, Let''s take MPEG as an example. There are two basic frame types, anchor frames and predictive frames. Of the predictive frames, there are one-way predictive and multi-way predictive. The predictive frames offer significantly more compression than anchor frames, and thus are favored in higher compressed streams. However, if an error occurs on a frame, that error will propagate until it either moves off the frame, or an anchor frame is reached. In broadcast, they typically space the anchor frames every half second, to bound the time it takes to start a new stream when changing channels. However, this also means that an error may take up to a half second to recover. Depending upon the type of error, this could be confined to a single block, a stripe, or even a whole frame. On more constraint bandwidth systems, like teleconferencing, I''ve seen anchor frames spaced as much as 30 seconds apart. These usually included some minimal error concealment techniques, but aren''t really robust. So I guess it depends upon what you mean by "recover fast". It could be as short as a fraction of a second, but could be several seconds. This message posted from opensolaris.org
> > Most video formats are designed to handle > > errors--they''ll drop a frame > > or two, but they''ll resync quickly. So, depending > on > > the size of the > > error, there may be a visible glitch, but it''ll > keep > > working. > > Actually, Let''s take MPEG as an example. There are > two basic frame types, anchor frames and predictive > frames. Of the predictive frames, there are one-way > predictive and multi-way predictive. The predictive > frames offer significantly more compression than > anchor frames, and thus are favored in higher > compressed streams. However, if an error occurs on a > frame, that error will propagate until it either > moves off the frame, or an anchor frame is reached. > > In broadcast, they typically space the anchor frames > every half second, to bound the time it takes to > start a new stream when changing channels. However, > this also means that an error may take up to a half > second to recover. Depending upon the type of error, > this could be confined to a single block, a stripe, > or even a whole frame. > > On more constraint bandwidth systems, like > teleconferencing, I''ve seen anchor frames spaced as > much as 30 seconds apart. These usually included > some minimal error concealment techniques, but aren''t > really robust. > > So I guess it depends upon what you mean by "recover > fast". It could be as short as a fraction of a > second, but could be several seconds.Ah - thanks to both of you. My own knowledge of video format internals is so limited that I assumed most people here would be at least equally familiar with the notion that a flipped bit or two in a video would hardly qualify as any kind of disaster (or often even as being noticeable, unless one were searching for it, in the case of commercial-quality video). David''s comment about jpeg corruption would be more worrisome if it were clear that any significant number of ''consumers'' (the immediate subject of my original comment in this area) had anything approaching 1 TB of jpegs on their systems (which at an average of 1 MB per jpeg would be around a million pictures...). If you include ''image files of various sorts'', as he did (though this also raises the question of whether we''re still talking about ''consumers''), then you also have to specify exactly how damaging single-bit errors are to those various ''sorts'' (one might guess not very for the uncompressed formats that might well be taking up most of the space). And since the CERN study seems to suggest that the vast majority of errors likely to be encountered at this level of incidence (and which could be caught by ZFS) are *detectable* errors, they''ll (in the unlikely event that you encounter them at all) typically only result in requiring use of a RAID (or backup) copy (surely one wouldn''t be entrusting data of any real value to a single disk). So I see no reason to change my suggestion that consumers just won''t notice the level of increased reliability that ZFS offers in this area: not only would the difference be nearly invisible even if the systems they ran on were otherwise perfect, but in the real world consumers have other reliability issues to worry about that occur multiple orders of magnitude more frequently than the kinds that ZFS protects against. - bill This message posted from opensolaris.org
> : In case of a filesystem, I do not see why the > filesystem could > : be a derived work from e.g. Linux. > > Indeed not, however AIUI the FSF do.My impression is that GPFS on Linux was (and may still be) provided as a binary proprietary loadable kernel module, plus some GPL glue. Not by any means an ideal solution, but a solution (at least one that''s acceptable to Linus, and while other Linux developers and the FSF may be less sanguine about loadable proprietary kernel modules they haven''t made any real fuss for so many years that one might suspect they''re resigned to them). - bill This message posted from opensolaris.org
can you guess? wrote: ...> Ah - thanks to both of you. My own knowledge of video format internals > is so limited that I assumed most people here would be at least equally > familiar with the notion that a flipped bit or two in a video would > hardly qualify as any kind of disaster (or often even as being > noticeable, unless one were searching for it, in the case of > commercial-quality video). > > David''s comment about jpeg corruption would be more worrisome if it were > clear that any significant number of ''consumers'' (the immediate subject > of my original comment in this area) had anything approaching 1 TB of > jpegs on their systems (which at an average of 1 MB per jpeg would be > around a million pictures...). If you include ''image files of various > sorts'', as he did (though this also raises the question of whether we''re > still talking about ''consumers''), then you also have to specify exactly > how damaging single-bit errors are to those various ''sorts'' (one might > guess not very for the uncompressed formats that might well be taking up > most of the space). And since the CERN study seems to suggest that the > vast majority of errors likely to be encountered at this level of > incidence (and which could be caught by ZFS) are *detectable* errors, > they''ll (in the unlikely event that you encounter them at all) typically > only result in requiring use of a RAID (or backup) copy (surely one > wouldn''t be entrusting data of any real value to a single disk).I have to comment here. As a bloke with a bit of a photography habit - I have a 10Mpx camera and I shoot in RAW mode - it is very, very easy to acquire 1Tb of image files in short order. Each of the photos I take is between 8 and 11Mb, and if I''m at a sporting event or I''m travelling for work or pleasure, it is *incredibly* easy to amass several hundred Mb of photos every single day. I''m by no means a professional photographer (so I''m not out taking photos every single day), although a very close friend of mine is. My photo storage is protected by ZFS with mirroring and backups to dvd media. My profotog friend has 3 copies of all her data - working set, immediate copy on usb-attached disk, and second backup also on usb-attached disk but disconnected. Even if you''ve got your original file archived, you still need your working copies available, and Adobe Photoshop can turn that RAW file into a PSD of nearly 60Mb in some cases. It is very easy for the storage medium to acquire some degree of corruption - whether it''s a CF or SD card, they all use FAT32. I have been in the position of losing photos due to this. Not many - perhaps a dozen over the course of 12 months. That flipped bit which you seem to be dismissing as "hardly... a disaster" can in fact make your photo file totally useless, because not only will you probably not be able to get the file off the media card, but whatever software you''re using to keep track of your catalog will also be unable to show you the entire contents. That might be the image itself, or it might be the equally important EXIF information. I don''t depend on FAT32-formatted media cards to make my living, fortunately, but if I did I imagine I''d probably end up only using each card for about a month before exercising caution and purchasing a new one rather than depending on the card itself to be reliable any more. 1Tb of photos shot on a 10MPx camera in the camera''s native RAW format is around 100,000 photos. It''s not difficult to imagine a "consumer" having that sort of storage requirement. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
can you guess? <billtodd at metrocast.net> wrote:> > : In case of a filesystem, I do not see why the > > filesystem could > > : be a derived work from e.g. Linux. > > > > Indeed not, however AIUI the FSF do. > > My impression is that GPFS on Linux was (and may still be) provided as a binary proprietary loadable kernel module, plus some GPL glue. Not by any means an ideal solution, but a solution (at least one that''s acceptable to Linus, and while other Linux developers and the FSF may be less sanguine about loadable proprietary kernel modules they haven''t made any real fuss for so many years that one might suspect they''re resigned to them).It is obvious that even RMS concurs that there is no problem as he did not sue Veritas for publishing a modified version of GNU tar. This Veritas variant of GNU tar: - comes with makefiles and a few inline modification in the GNU tar original source - Needs some libraries that are not under GPL in order to link - These libraries are _not_ part of the "work" as they have been written for another purpose. - These libraries are not published in source but binary only. - GPL ?3 explains that there is a difference between "the work" and "executable work". For the latter you need only to suply everything to reproduce the binary. Also note that Eben Moglen did explain why there is no problem with non-GPL code used by GPL projects in his talk at the press conference for the first GPLv3 draft. To understand RMS, you need to be very careful interpreting what he says. He often says things in order to make people believe that the GPL is more restrictive than it actually is. His habbit verifies that he believes different. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On 9-Nov-07, at 2:45 AM, can you guess? wrote:>>> Au contraire: I estimate its worth quite >> accurately from the undetected error rates reported >> in the CERN "Data Integrity" paper published last >> April (first hit if you Google ''cern "data >> integrity"''). >>> >>>> While I have yet to see any checksum error >> reported >>>> by ZFS on >>>> Symmetrix arrays or FC/SAS arrays with some other >>>> "cheap" HW I''ve seen >>>> many of them >>> >>> While one can never properly diagnose anecdotal >> issues off the cuff in a Web forum, given CERN''s >> experience you should probably check your >> configuration very thoroughly for things like >> marginal connections: unless you''re dealing with a >> far larger data set than CERN was, you shouldn''t have >> seen ''many'' checksum errors. >> >> Well single bit error rates may be rare in normal >> operation hard >> drives, but from a systems perspective, data can be >> corrupted anywhere >> between disk and CPU. > > The CERN study found that such errors (if they found any at all, > which they couldn''t really be sure of) were far less common than > the manufacturer''s spec for plain old detectable but unrecoverable > bit errors or to the one hardware problem that they discovered (a > disk firmware bug that appeared related to the unusual demands and > perhaps negligent error reporting of their RAID controller and > caused errors at a rate about an order of magnitude higher than the > nominal spec for detectable but unrecoverable errors). > > This suggests that in a ZFS-style installation without a hardware > RAID controller they would have experienced at worst a bit error > about every 10^14 bits or 12 TBAnd how about FAULTS? hw/firmware/cable/controller/ram/...> (the manufacturer''s spec rate for detectable but unrecoverable > errors) - though some studies suggest that the actual incidence of > ''bit rot'' is considerably lower than such specs. Furthermore, > simply scrubbing the disk in the background (as I believe some open- > source LVMs are starting to do and for that matter some disks are > starting to do themselves) would catch virtually all such errors in > a manner that would allow a conventional RAID to correct them, > leaving a residue of something more like one error per PB that ZFS > could catch better than anyone else save WAFL. > > I know you''re not interested >> in anecdotal >> evidence, > > It''s less that I''m not interested in it than that I don''t find it > very convincing when actual quantitative evidence is available that > doesn''t seem to support its importance. I know very well that > things like lost and wild writes occur, as well as the kind of > otherwise undetected bus errors that you describe, but the > available evidence seems to suggest that they occur in such small > numbers that catching them is of at most secondary importance > compared to many other issues. All other things being equal, I''d > certainly pick a file system that could do so, but when other > things are *not* equal I don''t think it would be a compelling > attraction. > > but I had a box that was randomly >> corrupting blocks during >> DMA. The errors showed up when doing a ZFS scrub and >> I caught the >> problem in time. > > Yup - that''s exactly the kind of error that ZFS and WAFL do a > perhaps uniquely good job of catching.WAFL can''t catch all: It''s distantly isolated from the CPU end.> Of course, buggy hardware can cause errors that trash your data > in RAM beyond any hope of detection by ZFS, but (again, other > things being equal) I agree that the more ways you have to detect > them, the better. That said, it would be interesting to know who > made this buggy hardware. > > ... > >> Like others have said for big business; as a consumer >> I can reasonably >> comforably buy off the shelf cheap controllers and >> disks, and know >> that should any part of the system be flaky enough to >> cause data >> corruption the software layer will catch it which >> both saves money and >> creates peace of mind. > > CERN was using relatively cheap disksDon''t forget every other component in the chain.> and found that they were more than adequate (at least for any > normal consumer use) without that additional level of protection: > the incidence of errors, even including the firmware errors which > presumably would not have occurred in a normal consumer > installation lacking hardware RAID, was on the order of 1 per TB - > and given that it''s really, really difficult for a consumer to come > anywhere near that much data without most of it being video files > (which just laugh and keep playing when they discover small errors) > that''s pretty much tantamount to saying that consumers would > encounter no *noticeable* errors at all. > > Your position is similar to that of an audiophile enthused about a > measurable but marginal increase in music quality and trying to > convince the hoi polloi that no other system will do: while other > audiophiles may agree with you, most people just won''t consider it > important - and in fact won''t even be able to distinguish it at all.Data integrity *is* important. --Toby> > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 9-Nov-07, at 3:23 PM, Scott Laird wrote:> Most video formats are designed to handle errors--they''ll drop a frame > or two, but they''ll resync quickly. So, depending on the size of the > error, there may be a visible glitch, but it''ll keep working. > > Interestingly enough, this applies to a lot of MPEG-derived formats as > well, like MP3. I had a couple bad copies of MP3s that I tried to > listen to on my computer a few weeks ago (podcasts copied via > bluetooth off of my phone, apparently with no error checking), and it > made the story hard to follow when a few seconds would disappear out > of the middle, but it didn''t destroy the file.Well that''s nice. How about your database, your source code, your ZIP file, your encrypted file, ... --T> > > Scott > > On 11/9/07, David Dyer-Bennet <dd-b at dd-b.net> wrote: >> can you guess? wrote: >> >>> CERN was using relatively cheap disks and found that they were >>> more than adequate (at least for any normal consumer use) without >>> that additional level of protection: the incidence of errors, >>> even including the firmware errors which presumably would not >>> have occurred in a normal consumer installation lacking hardware >>> RAID, was on the order of 1 per TB - and given that it''s really, >>> really difficult for a consumer to come anywhere near that much >>> data without most of it being video files (which just laugh and >>> keep playing when they discover small errors) that''s pretty much >>> tantamount to saying that consumers would encounter no >>> *noticeable* errors at all. >>> >> >> I haven''t played with bit errors in video. A bit error in a JPEG >> generally corrupts everything after that point. And it''s pretty easy >> for people to have a TB or so of image files of various sorts. >> Furthermore, I''m interested in archiving those for at least the >> rest of >> my life. >> >> Because I''m in touch with a number of professional photographers, who >> have far more pictures than I do, I think of 1TB as a level a lot of >> people are using in a non-IT context, with no professional sysadmin >> involved in maintaining or designing their storage schemes. >> >> I think all of these are good reasons why people *do* care about >> errors >> at the levels you mention. >> >> One of my photographer friends found a bad cable in one of his >> computers >> that was upping his error rate by an order of magnitude (to 10^-13 I >> think). Having ZFS would have made this less dangerous, and >> detected it >> more quickly. >> >> Generally, I think you underestimate the amount of data some people >> have, and how much they care about it. I can''t imagine this will >> decrease significantly over the next decade, either. >> >> -- >> David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ >> Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ >> Photos: http://dd-b.net/photography/gallery/ >> Dragaera: http://dragaera.info >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
can you guess? wrote:> Ah - thanks to both of you. My own knowledge of video format internals is so limited that I assumed most people here would be at least equally familiar with the notion that a flipped bit or two in a video would hardly qualify as any kind of disaster (or often even as being noticeable, unless one were searching for it, in the case of commercial-quality video). >But also, you''re thinking like a consumer, not like an archivist. A bit lost in an achival video *is* a disaster, or at least a serious degradation.> David''s comment about jpeg corruption would be more worrisome if it were clear that any significant number of ''consumers'' (the immediate subject of my original comment in this area) had anything approaching 1 TB of jpegs on their systems (which at an average of 1 MB per jpeg would be around a million pictures...). If you include ''image files of various sorts'', as he did (though this also raises the question of whether we''re still talking about ''consumers''), then you also have to specify exactly how damaging single-bit errors are to those various ''sorts'' (one might guess not very for the uncompressed formats that might well be taking up most of the space). And since the CERN study seems to suggest that the vast majority of errors likely to be encountered at this level of incidence (and which could be caught by ZFS) are *detectable* errors, they''ll (in the unlikely event that you encounter them at all) typically only result in requiring use of a RAID (or backup) copy (surely > one wouldn''t be entrusting data of any real value to a single disk). >They''ll only be detected when the files are *read*; ZFS has the "scrub" concept, but most RAID systems don''t, so the error could persist for years (and through generations of backups) before anybody noticed.> So I see no reason to change my suggestion that consumers just won''t notice the level of increased reliability that ZFS offers in this area: not only would the difference be nearly invisible even if the systems they ran on were otherwise perfect, but in the real world consumers have other reliability issues to worry about that occur multiple orders of magnitude more frequently than the kinds that ZFS protects against. >And yet I know many people who have lost data in ways that ZFS would have prevented. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> So I see no reason to change my suggestion that consumers just won''t notice the level of increased reliability that ZFS offers in this area: not only would the difference be nearly invisible even if the systems they ran on were otherwise perfect, but in the real world consumers have other reliability issues to worry about that occur multiple orders of magnitude more frequently than the kinds that ZFS protects against.Even if the errors are somehow detectable, you need the whole system infrastructure to be able to deal with that. Having my MP3 collection gotten fucked up thanks to neither Windows nor NTFS being able to properly detect and report in-flight data corruption (i.e. bad cable), after copying it from one drive to another to replace one of them, I''m really glad that I''ve ZFS to manage my data these days. It seemed to be an issue that existed apparently way before the actual disk copy, because I''ve always wondered about transient glitching in my music, but no issues being reported in the event log. Thanks to ZFS'' aggressive checksumming, had it been in use instead, I''d have noticed long before. Now the formerly transient glitches have become permanent. As far as all these reliability studies go, my practical experience is quite the opposite. I''m fixing computers of friends and acquaintances left and right, bad sectors are rather pretty common. While it would be impractical to introduce RAID or just put ditto blocks on virtually everything, being able to guarantee those users some safety (excluding complete drive failure) on at least their dearest data, like important documents or holiday pictures is actually a nice thing. You don''t get that stuff with most other free filesystems (if at all). Granted, for the average user, that isn''t versed in such things, a ZFS status Gnome applet or whatever would help a lot, because those people won''t be occasionally doing manual system checks like I do and as such not notice bigger issues. -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 648 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071110/4aeb8cd8/attachment.bin>
This is a bit weird: I just wrote the following response to a dd-b post that now seems to have disappeared from the thread. Just in case that''s a temporary aberration, I''ll submit it anyway as a new post.> can you guess? wrote: > > Ah - thanks to both of you. My own knowledge of > video format internals is so limited that I assumed > most people here would be at least equally familiar > with the notion that a flipped bit or two in a video > would hardly qualify as any kind of disaster (or > often even as being noticeable, unless one were > searching for it, in the case of commercial-quality > video). > > > > But also, you''re thinking like a consumer,Well, yes - since that''s the context of my comment to which you originally responded. Did you manage to miss that, even after I repeated it above in the post to which you''re responding *this* time? not like> an archivist. A bit > lost in an achival video *is* a disaster, or at least > a serious degradation.Or not, unless you''re really, really obsessive-compulsive about it - certainly *far* beyond the point of being reasonably characterized as a ''consumer''. ... And since the CERN study seems> to suggest that the vast majority of errors likely to > be encountered at this level of incidence (and which > could be caught by ZFS) are *detectable* errors, > they''ll (in the unlikely event that you encounter > them at all) typically only result in requiring use > of a RAID (or backup) copy (surely > > one wouldn''t be entrusting data of any real value > to a single disk). > > > > They''ll only be detected when the files are *read*; > ZFS has the "scrub" > concept, but most RAID systems don''t,Perhaps you''re just not very familiar with other systems, David. For example, see http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Software_RAID#Data_Scrubbing, where it tells you how to run a software RAID scrub manually (or presumably in a cron job if it can''t be configured to be more automatic). Or a variety of Adaptec RAID cards which support two different forms of scanning/fixup which presumably could also be scheduled externally if an internal scheduling mechanism is not included). I seriously doubt that these are the only such facilities out there: they''re just ones I happen to be able to cite with minimal effort. ...> > So I see no reason to change my suggestion that > consumers just won''t notice the level of increased > reliability that ZFS offers in this area: not only > would the difference be nearly invisible even if the > systems they ran on were otherwise perfect, but in > the real world consumers have other reliability > issues to worry about that occur multiple orders of > magnitude more frequently than the kinds that ZFS > protects against. > > > > And yet I know many people who have lost data in ways > that ZFS would > have prevented.Specifics would be helpful here. How many? Can they reasonably be characterized as consumers (I''ll remind you once more: *that''s* the subject to which your comments purport to be responding)? Can the data loss reasonably be characterized as significant (to ''consumers'')? Were the causes hardware problems that could reasonably have been avoided (''bad cables'' might translate to ''improperly inserted, overly long, or severely kinked cables'', for example - and such a poorly-constructed system will tend to have other problems that ZFS cannot address)? - bill This message posted from opensolaris.org
> can you guess? wrote:... If you include> ''image files of various > > sorts'', as he did (though this also raises the > question of whether we''re > > still talking about ''consumers''), then you also > have to specify exactly > > how damaging single-bit errors are to those various > ''sorts'' (one might > > guess not very for the uncompressed formats that > might well be taking up > > most of the space). And since the CERN study seems > to suggest that the > > vast majority of errors likely to be encountered at > this level of > > incidence (and which could be caught by ZFS) are > *detectable* errors, > > they''ll (in the unlikely event that you encounter > them at all) typically > > only result in requiring use of a RAID (or backup) > copy (surely one > > wouldn''t be entrusting data of any real value to a > single disk). > > > I have to comment here. As a bloke with a bit of a > photography > habit - I have a 10Mpx camera and I shoot in RAW mode > - it is > very, very easy to acquire 1Tb of image files in > short order.So please respond to the question that I raised above (and that you yourself quoted): just how much damage will a single-bit error do to such a RAW file?> > Each of the photos I take is between 8 and 11Mb, and > if I''m > at a sporting event or I''m travelling for work or > pleasure, > it is *incredibly* easy to amass several hundred Mb > of photos > every single day.Even assuming that you meant ''MB'' rather than ''Mb'' above, that suggests that it would take you well over a decade to amass 1 TB of RAW data (assuming that, as you suggest both above and later, you didn''t accumulate several hundred MB of pictures *every* day but just on those days when you were traveling, at a sporting event, etc.).> > I''m by no means a professional photographer (so I''m > not out > taking photos every single day), although a very > close friend > of mine is. My photo storage is protected by ZFS with > mirroring > and backups to dvd media. My profotog friend has 3 > copies of > all her data - working set, immediate copy on > usb-attached disk, > and second backup also on usb-attached disk but > disconnected.Sounds wise on both your parts - and probably makes ZFS''s extra protection pretty irrelevant (I won''t bother repeating why here).> > Even if you''ve got your original file archived, you > still need > your working copies available, and Adobe Photoshop > can turn that > RAW file into a PSD of nearly 60Mb in some cases.If you really amass all your pictures this way (rather than, e.g., use Photoshop on some of them and then save the result in a less verbose format), I''ll suggest that this takes you well beyond the ''consumer'' range of behavior.> > It is very easy for the storage medium to acquire > some degree > of corruption - whether it''s a CF or SD card, they > all use > FAT32. I have been in the position of losing photos > due to > this. Not many - perhaps a dozen over the course of > 12 months.So in those cases you didn''t maintain multiple copies. Bad move, and usually nothing that using ZFS could help with. While I''m not intimately acquainted with flash storage, my impression is that data loss usually occurs due to bad writes (since once written the data just sits there persistently and AFAIK is not subject ot the kinds of ''bit rot'' that disk and tape data can experience). So if the loss occurs to the original image captured on flash before it can be copied elsewhere, you''re just SOL and nothing ZFS offers could help you.> > That flipped bit which you seem to be dismissing as > "hardly... > a disaster" can in fact make your photo file totally > useless, > because not only will you probably not be able to get > the file > off the media card, but whatever software you''re > using to keep > track of your catalog will also be unable to show you > the > entire contents. That might be the image itself, or > it might > be the equally important EXIF information.Here come those pesky numbers again, I''m afraid. Because given that the size difference between your image data and the metadata (including EXIF information, if that''s what I suspect it is) is at least several orders of magnitude, the chance that the bad bit will be in something other than the image data is pretty negligible. So even if you can format your card to use ZFS (can you? if not, what possible relevance does your comment above have to this discussion?), doing so won''t help at all: the affected file will still be inaccessible (unless you use ZFS to create a redundant pool across multiple such cards: is that really what you''re suggesting should be done?) both to normal extraction (though couldn''t dd normally get off everything but the bad sector?) and to your cataloging software.> > I don''t depend on FAT32-formatted media cards to make > my > living, fortunately, but if I did I imagine I''d > probably end > up only using each card for about a month before > exercising > caution and purchasing a new one rather than > depending on the > card itself to be reliable any more.The ''wear leveling'' algorithms on current cards are supposedly quite good, so that degree of caution may no longer be necessary (at least if you get a decent card).> > 1Tb of photos shot on a 10MPx camera in the camera''s > native > RAW format is around 100,000 photos. It''s not > difficult to > imagine a "consumer" having that sort of storage > requirement.Well, as I noted above in your own case it would take well over a decade to generate ''that sort of storage requirement''. Furthermore, it *is* difficult to imagine a ''consumer'' keeping all of them in RAW format - and even if they did, the question still remains: just how much will a single-bit error affect such a photo? - bill This message posted from opensolaris.org
can you guess? wrote:> This is a bit weird: I just wrote the following response to a dd-b post that now seems to have disappeared from the thread. Just in case that''s a temporary aberration, I''ll submit it anyway as a new post. >Strange things certainly happen here now and then. The post you''re replying to is one I definitely did send in. Could I have messed up and sent it just to you, thus causing confusion when you read it, deleted it, remembered it as in the group rather than direct?>> can you guess? wrote: >> >>> Ah - thanks to both of you. My own knowledge of >>> >> video format internals is so limited that I assumed >> most people here would be at least equally familiar >> with the notion that a flipped bit or two in a video >> would hardly qualify as any kind of disaster (or >> often even as being noticeable, unless one were >> searching for it, in the case of commercial-quality >> video). >> >> But also, you''re thinking like a consumer, >> > > Well, yes - since that''s the context of my comment to which you originally responded. Did you manage to miss that, even after I repeated it above in the post to which you''re responding *this* time? >I''m a professional computer engineer, but not a professional photographer or archivist. Long before I got involved in computer archiving issues, I was concerned with "archival processing" of my film and prints, so they''d last a long long time. This was generally a big issue in the photo community, *more* among artists and amateurs than among professionals (the pros mostly cared about getting the assignment done and moving on). Similarly, historical sites like fanac.org (documenting some of the history of science-fiction fandom) are run by amateurs, not professionals. "Consumer" might cut across the set of people who do things with computers differently, I don''t know. I''d consider the fanac.org project a consumer use, in the sense that it''s not supported by either business income or a big foundation grant, doesn''t have paid staff or a budget for much beyond hosting and some backups, etc. But you may be using a more precise meaning of "consumer". Backing off slightly, my point is that lots of people who are not professionally trained or legally or economically required for their business to worry about archiving issues a lot choose to do so anyway. I''m sure you''ve seen friends messing with old family pictures and being very pleased to have them; you may well have done it yourself. (On the computer, physically, whichever.) I don''t see the world as a set of sharply-delineated disjoint categories; if you do, we''re going to have trouble reaching any sort of meeting of minds, maybe even communicating at all.> not like > >> an archivist. A bit >> lost in an achival video *is* a disaster, or at least >> a serious degradation. >> > > Or not, unless you''re really, really obsessive-compulsive about it - certainly *far* beyond the point of being reasonably characterized as a ''consumer''. >I disagree; lots of people are quite careful about some parts of their history -- old letters, family photos, videos of the kids, or all of them. Some people aren''t very careful and don''t mind if they slip away. But the ones who *do* care a lot aren''t that rare, I don''t believe. Or it could be other things -- the history of their town, their school, their family, their company, whatever.> ... > > And since the CERN study seems > >> to suggest that the vast majority of errors likely to >> be encountered at this level of incidence (and which >> could be caught by ZFS) are *detectable* errors, >> they''ll (in the unlikely event that you encounter >> them at all) typically only result in requiring use >> of a RAID (or backup) copy (surely >> >>> one wouldn''t be entrusting data of any real value >>> >> to a single disk). >> >> They''ll only be detected when the files are *read*; >> ZFS has the "scrub" >> concept, but most RAID systems don''t, >> > > Perhaps you''re just not very familiar with other systems, David. > > For example, see http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Software_RAID#Data_Scrubbing, where it tells you how to run a software RAID scrub manually (or presumably in a cron job if it can''t be configured to be more automatic). Or a variety of Adaptec RAID cards which support two different forms of scanning/fixup which presumably could also be scheduled externally if an internal scheduling mechanism is not included). I seriously doubt that these are the only such facilities out there: they''re just ones I happen to be able to cite with minimal effort. >Okay, it''s good those are there too.> ... > > >>> So I see no reason to change my suggestion that >>> >> consumers just won''t notice the level of increased >> reliability that ZFS offers in this area: not only >> would the difference be nearly invisible even if the >> systems they ran on were otherwise perfect, but in >> the real world consumers have other reliability >> issues to worry about that occur multiple orders of >> magnitude more frequently than the kinds that ZFS >> protects against. >> >> And yet I know many people who have lost data in ways >> that ZFS would >> have prevented. >> > > Specifics would be helpful here. How many? Can they reasonably be characterized as consumers (I''ll remind you once more: *that''s* the subject to which your comments purport to be responding)? Can the data loss reasonably be characterized as significant (to ''consumers'')? Were the causes hardware problems that could reasonably have been avoided (''bad cables'' might translate to ''improperly inserted, overly long, or severely kinked cables'', for example - and such a poorly-constructed system will tend to have other problems that ZFS cannot address)? >"Reasonably avoided" is irrelevant; they *weren''t* avoided. And most people (including me) have not the slightest clue how to go about telling "good cables" from "bad cables" other than avoiding obviously-damaged ones and not buying from the peg that''s 1/3 the cost of all the other pegs (by examination; I could propose a set of tests to eventually tell the difference). (And that cheap peg *might* actually be just fine, I''m just a bit paranoid about exceptional pricing.) Nearly everybody I can think of who''s used a computer for more than a couple of years has stories of stuff they''ve lost. I knew a lot of people who lost their entire hard drive at one point or other especially in the 1985-1995 timeframe. The people were quite upset by the loss; I''m not going to accept somebody else deciding it''s "not significant". Writers faced this problem a lot. I''d classify that as a "consumer" use, but not an amateur one. Their computers are being bought by them at retail, not provided by a corporation and maintained by an IT department. It turns out, most people''s lives are fairly important to them. When people start having a lot of their life on their computer, it starts becoming important to them not to lose it. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
can you guess? wrote:>> >> I have to comment here. As a bloke with a bit of a >> photography >> habit - I have a 10Mpx camera and I shoot in RAW mode >> - it is >> very, very easy to acquire 1Tb of image files in >> short order. >> > > So please respond to the question that I raised above (and that you yourself quoted): just how much damage will a single-bit error do to such a RAW file? >In a compressed raw file, it''ll affect the rest of the file generally; so it essentially renders the whole thing useless, unless it happens to hit towards the end and you can crop around it. If it hits in metadata (statistically unlikely, the bulk of the file is image data) it''s probably at worst annoying, but it *might* hit one of the bits software uses to recognize and validate the file, too. In an uncompressed raw file, if it hits in image data it''ll affect probably 9 pixels; it''s easily fixed.>> Each of the photos I take is between 8 and 11Mb, and >> if I''m >> at a sporting event or I''m travelling for work or >> pleasure, >> it is *incredibly* easy to amass several hundred Mb >> of photos >> every single day. >> > > Even assuming that you meant ''MB'' rather than ''Mb'' above, that suggests that it would take you well over a decade to amass 1 TB of RAW data (assuming that, as you suggest both above and later, you didn''t accumulate several hundred MB of pictures *every* day but just on those days when you were traveling, at a sporting event, etc.). >I seem to come up with a DVD full every month or two these days, myself. I mean, it varies; there was this one weekend I filled 4 or some such; but it varies both ways, and that average isn''t too far off. 25GB a year seems to take 40 years to reach 1TB. However, my rate has increased so dramatically in the last 7 years that I''m not at all sure what to expect; is it time for the curve to level off yet, for me? Who knows! Then again, I''m *also* working on scanning in the *last* 40 years worth of photos, and those tend to be bigger (scans are less good pixels so you need more of them), and *that* runs the numbers up, in chunks when I take time to do a big scanning batch.>> I''m by no means a professional photographer (so I''m >> not out >> taking photos every single day), although a very >> close friend >> of mine is. My photo storage is protected by ZFS with >> mirroring >> and backups to dvd media. My profotog friend has 3 >> copies of >> all her data - working set, immediate copy on >> usb-attached disk, >> and second backup also on usb-attached disk but >> disconnected. >> > > Sounds wise on both your parts - and probably makes ZFS''s extra protection pretty irrelevant (I won''t bother repeating why here). > > >> Even if you''ve got your original file archived, you >> still need >> your working copies available, and Adobe Photoshop >> can turn that >> RAW file into a PSD of nearly 60Mb in some cases. >> > > If you really amass all your pictures this way (rather than, e.g., use Photoshop on some of them and then save the result in a less verbose format), I''ll suggest that this takes you well beyond the ''consumer'' range of behavior. >It''s not snapshot usage, but it''s common amateur usage. Amateurs tend to do lots of the same things professionals do (and sometimes better, though not usually). Hobbies are like that. The argument for the full Photoshop file is the concept of "nondestructive editing". I do retouching on new layers instead of erasing what I already have with the new stuff. I use adjustment layers with layer masks for curve adjustments. I can go back and improve the mask, or nudge the curves, without having to start over from scratch. It''s a huge win. And it may be more valuable for amateurs, actually; professionals tend to have the experience to know their minds better and know when they have it right, so many of them may do less revisiting old stuff and improving it a bit. Also, when the job is done and sent to the client, they tend not to care about it any more.>> It is very easy for the storage medium to acquire >> some degree >> of corruption - whether it''s a CF or SD card, they >> all use >> FAT32. I have been in the position of losing photos >> due to >> this. Not many - perhaps a dozen over the course of >> 12 months. >> > > So in those cases you didn''t maintain multiple copies. Bad move, and usually nothing that using ZFS could help with. While I''m not intimately acquainted with flash storage, my impression is that data loss usually occurs due to bad writes (since once written the data just sits there persistently and AFAIK is not subject ot the kinds of ''bit rot'' that disk and tape data can experience). So if the loss occurs to the original image captured on flash before it can be copied elsewhere, you''re just SOL and nothing ZFS offers could help you. >I don''t know a whole lot about flash cards either. The cameras I know the dataflow on are writing from an internal ram buffer to the card, so if the write error was reported at the time of write, the block could be mapped out and rewritten, since the data is still available. But anything after the file is reported successfully written to the card would, I believe, be too late. I''ve never lost a file to a card problem, or even to a human glitch involving card handling (though I did have to use a recovery utility once, for a human glitch). So far. I use cards generally until I get a camera that makes files so much bigger than the old camera that I have to buy new cards :-). I think there are more sequences of reasonable-seeming things to do with a card that result in corruption of FAT32 filesystem than there would be for a better filesystem. But the cards need to be formatted with a pretty least-common-denominator (or at least very very widely supported) filesystem, and ZFS clearly isn''t that right now. And of course when working with existing cameras, one needs to use whatever the camera understands (and that''s universally FAT32, so far as I can remember).>> That flipped bit which you seem to be dismissing as >> "hardly... >> a disaster" can in fact make your photo file totally >> useless, >> because not only will you probably not be able to get >> the file >> off the media card, but whatever software you''re >> using to keep >> track of your catalog will also be unable to show you >> the >> entire contents. That might be the image itself, or >> it might >> be the equally important EXIF information. >> > > Here come those pesky numbers again, I''m afraid. Because given that the size difference between your image data and the metadata (including EXIF information, if that''s what I suspect it is) is at least several orders of magnitude, the chance that the bad bit will be in something other than the image data is pretty negligible. > > So even if you can format your card to use ZFS (can you? if not, what possible relevance does your comment above have to this discussion?), doing so won''t help at all: the affected file will still be inaccessible (unless you use ZFS to create a redundant pool across multiple such cards: is that really what you''re suggesting should be done?) both to normal extraction (though couldn''t dd normally get off everything but the bad sector?) and to your cataloging software. >Hmmm; the new Nikon D3 supports dual CF slots, and I do believe that one of the modes is mirroring across the two slots. I think of that as for really *really* paranoid people. Personally, I *did* worry about my pictures being only on the one roll of film back when I shot film. The cameras currently in the field do not, of course, support writing to cards formatted with ZFS, so it''s not a practical choice today. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> can you guess? wrote: > > This is a bit weird: I just wrote the following > response to a dd-b post that now seems to have > disappeared from the thread. Just in case that''s a > temporary aberration, I''ll submit it anyway as a new > post. > > > > Strange things certainly happen here now and then. > > The post you''re replying to is one I definitely did > send in. Could I > have messed up and sent it just to you, thus causing > confusion when you > read it, deleted it, remembered it as in the group > rather than direct?I used the forum''s ''quote original'' feature in replying and then received a screen-full of Java errors saying that the parent post didn''t exist when I attempted to submit it. Most of the balance of your post isn''t addressed in any detail because it carefully avoids the fundamental issues that I raised: 1. How much visible damage does a single-bit error actually do to the kind of large photographic (e.g., RAW) file you are describing? If it trashes the rest of the file, as you state is the case with jpeg, then you might have a point (though you''d still have to address my second issue below), but if it results in a virtually invisible blemish they you most certainly don''t. 2. If you actually care about your data, you''d have to be a fool to entrust it to *any* single copy, regardless of medium. And once you''ve got more than one copy, then you''re protected (at the cost of very minor redundancy restoration effort in the unlikely event that any problem occurs) against the loss of any one copy due to a minor error - the only loss of non-negligible likelihood that ZFS protects against better than other file systems. If you''re relying upon RAID to provide the multiple copies - though this would also arguably be foolish, if only due to the potential for trashing all the copies simultaneously - you''d probably want to schedule occasional scrubs, just in case you lost a disk. But using RAID as a substitute for off-line redundancy is hardly suitable in the kind of archiving situations that you describe - and therefore ZFS has absolutely nothing of value to offer there: you should be using off-line copies, and occasionally checking all copies for readability (e.g., by copying them to the null device - again, something you could do for your on-line copy with a cron job and which you should do for your off-line copy/copies once in a while as well. In sum, your support of ZFS in this specific area seems very much knee-jerk in nature rather than carefully thought out - exactly the kind of ''over-hyping'' that I pointed out in my first post in this thread. ...> >> And yet I know many people who have lost data in > ways > >> that ZFS would > >> have prevented. > >> > > > > Specifics would be helpful here. How many? Can they > reasonably be characterized as consumers (I''ll remind > you once more: *that''s* the subject to which your > comments purport to be responding)? Can the data loss > reasonably be characterized as significant (to > ''consumers'')? Were the causes hardware problems that > could reasonably have been avoided (''bad cables'' > might translate to ''improperly inserted, overly long, > or severely kinked cables'', for example - and such a > poorly-constructed system will tend to have other > problems that ZFS cannot address)? > > > > "Reasonably avoided" is irrelevant; they *weren''t* > avoided.While that observation has at least some merit, I''ll observe that you jumped directly to the last of my questions above while carefully ignoring the three questions that preceded it. ...> Nearly everybody I can think of who''s used a computer > for more than a > couple of years has stories of stuff they''ve lost.Of course they have - and usually in ways that ZFS would have been no help whatsoever in mitigating. I> knew a lot of > people who lost their entire hard drive at one point > or other especially > in the 1985-1995 timeframe.Fine example of a situation where only redundancy can save you, and where good old vanilla-flavored RAID (with scrubbing - but, as I noted, that''s hardly something that ZFS has any corner on) provides comparable protection to ZFS-with-mirroring. The people were quite> upset by the loss; > I''m not going to accept somebody else deciding it''s > "not significant".I never said such situations were not significant, David: I simply observed (and did so again above) that in virtually all of them ZFS offered no particular advantage over more conventional means of protection. You need to get a grip and try to understand the *specifics* of what''s being discussed here if you want to carry on a coherent discussion about it. - bill This message posted from opensolaris.org
can you guess? wrote:>> can you guess? wrote: >> >>> This is a bit weird: I just wrote the following >>> >> response to a dd-b post that now seems to have >> disappeared from the thread. Just in case that''s a >> temporary aberration, I''ll submit it anyway as a new >> post. >> >>> >>> >> Strange things certainly happen here now and then. >> >> The post you''re replying to is one I definitely did >> send in. Could I >> have messed up and sent it just to you, thus causing >> confusion when you >> read it, deleted it, remembered it as in the group >> rather than direct? >> > > I used the forum''s ''quote original'' feature in replying and then received a screen-full of Java errors saying that the parent post didn''t exist when I attempted to submit it. > > Most of the balance of your post isn''t addressed in any detail because it carefully avoids the fundamental issues that I raised: >Not true; and by selective quoting you have removed my specific responses to most of these issues.> 1. How much visible damage does a single-bit error actually do to the kind of large photographic (e.g., RAW) file you are describing? If it trashes the rest of the file, as you state is the case with jpeg, then you might have a point (though you''d still have to address my second issue below), but if it results in a virtually invisible blemish they you most certainly don''t. >I addressed this quite specifically, for two cases (compressed raw vs. uncompressed raw) with different results.> 2. If you actually care about your data, you''d have to be a fool to entrust it to *any* single copy, regardless of medium. And once you''ve got more than one copy, then you''re protected (at the cost of very minor redundancy restoration effort in the unlikely event that any problem occurs) against the loss of any one copy due to a minor error - the only loss of non-negligible likelihood that ZFS protects against better than other file systems. >You have to detect the problem first. ZFS is in a much better position to detect the problem due to block checksums.> If you''re relying upon RAID to provide the multiple copies - though this would also arguably be foolish, if only due to the potential for trashing all the copies simultaneously - you''d probably want to schedule occasional scrubs, just in case you lost a disk. But using RAID as a substitute for off-line redundancy is hardly suitable in the kind of archiving situations that you describe - and therefore ZFS has absolutely nothing of value to offer there: you should be using off-line copies, and occasionally checking all copies for readability (e.g., by copying them to the null device - again, something you could do for your on-line copy with a cron job and which you should do for your off-line copy/copies once in a while as well. >You have to detect the problem first. ZFS block checksums will detect problems that a simple read-only pass through most other filesystems will not detect.> In sum, your support of ZFS in this specific area seems very much knee-jerk in nature rather than carefully thought out - exactly the kind of ''over-hyping'' that I pointed out in my first post in this thread. >And your opposition to ZFS appears knee-jerk and irrational, from this end. But telling you that will have no beneficial effect, any more than what you just told me about how my opinions appear to you. Couldn''t we leave personalities out of this, in future?> ... > > >>>> And yet I know many people who have lost data in >>>> >> ways >> >>>> that ZFS would >>>> have prevented. >>>> >>>> >>> Specifics would be helpful here. How many? Can they >>> >> reasonably be characterized as consumers (I''ll remind >> you once more: *that''s* the subject to which your >> comments purport to be responding)? Can the data loss >> reasonably be characterized as significant (to >> ''consumers'')? Were the causes hardware problems that >> could reasonably have been avoided (''bad cables'' >> might translate to ''improperly inserted, overly long, >> or severely kinked cables'', for example - and such a >> poorly-constructed system will tend to have other >> problems that ZFS cannot address)? >> >>> >>> >> "Reasonably avoided" is irrelevant; they *weren''t* >> avoided. >> > > While that observation has at least some merit, I''ll observe that you jumped directly to the last of my questions above while carefully ignoring the three questions that preceded it. > >You''ll notice (since you responded) that I got to at least one of them by the end of the message. And you cut out of your quotes what I specifically said about cables; that''s cheating. So I responded to *at least* two of the things you say I didn''t respond to.> ... > > >> Nearly everybody I can think of who''s used a computer >> for more than a >> couple of years has stories of stuff they''ve lost. >> > > Of course they have - and usually in ways that ZFS would have been no help whatsoever in mitigating. >ZFS will help detect problems with marginal drives, cables, power supplies, controllers, memory, and motherboards. The block checksumming will show corruptions earlier on, and less ambiguously, giving you a warning that there''s a problem to find and fix that you mostly didn''t get before. This will on average reduce the amount of damage done before the problem is fixed. And, of course, since ZFS can be used for data redundancy, it can help protect against more drastic problems like whole drives going bad, or tracks going bad. Those cover the majority of ways people I know have lost data. (Other ways include theft and user error, neither of which ZFS helps with very much.)> I > >> knew a lot of >> people who lost their entire hard drive at one point >> or other especially >> in the 1985-1995 timeframe. >> > > Fine example of a situation where only redundancy can save you, and where good old vanilla-flavored RAID (with scrubbing - but, as I noted, that''s hardly something that ZFS has any corner on) provides comparable protection to ZFS-with-mirroring. >Agreed. But ZFS *is* applicable to that case, just not *uniquely* applicable.> The people were quite > >> upset by the loss; >> I''m not going to accept somebody else deciding it''s >> "not significant". >> > > I never said such situations were not significant, David: I simply observed (and did so again above) that in virtually all of them ZFS offered no particular advantage over more conventional means of protection. >And I disagree, I hope finally in enough detail for you to understand my reasoning.> You need to get a grip and try to understand the *specifics* of what''s being discussed here if you want to carry on a coherent discussion about it. > >You need to make your language less personal and emotional when engaging in public technical discussions. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> can you guess? wrote:...> > Most of the balance of your post isn''t addressed in > any detail because it carefully avoids the > fundamental issues that I raised: > > > > Not true; and by selective quoting you have removed > my specific > responses to most of these issues.While I''m naturally reluctant to call you an outright liar, David, you have hardly so far in this discussion impressed me as someone whose presentation is so well-organized and responsive to specific points that I can easily assume that I simply missed those responses. If you happen to have a copy of that earlier post, I''d like to see it resubmitted (without modification).> > 1. How much visible damage does a single-bit error > actually do to the kind of large photographic (e.g., > RAW) file you are describing? If it trashes the rest > of the file, as you state is the case with jpeg, then > you might have a point (though you''d still have to > address my second issue below), but if it results in > a virtually invisible blemish they you most certainly > don''t. > > > > I addressed this quite specifically, for two cases > (compressed raw vs. > uncompressed raw) with different results.Then please do so where we all can see it.> > > 2. If you actually care about your data, you''d > have to be a fool to entrust it to *any* single copy, > regardless of medium. And once you''ve got more than > one copy, then you''re protected (at the cost of very > minor redundancy restoration effort in the unlikely > event that any problem occurs) against the loss of > any one copy due to a minor error - the only loss of > non-negligible likelihood that ZFS protects against > better than other file systems. > > > > You have to detect the problem first. ZFS is in a > much better position > to detect the problem due to block checksums.Bulls***, to quote another poster here who has since been strangely quiet. The vast majority of what ZFS can detect (save for *extremely* rare undetectable bit-rot and for real hardware (path-related) errors that studies like CERN''s have found to be very rare - and you have yet to provide even anecdotal evidence to the contrary) can also be detected by scrubbing, and it''s arguably a lot easier to apply brute-force scrubbing (e.g., by scheduling a job that periodically copies your data to the null device if your system does not otherwise support the mechanism) than to switch your file system.> > > If you''re relying upon RAID to provide the multiple > copies - though this would also arguably be foolish, > if only due to the potential for trashing all the > copies simultaneously - you''d probably want to > schedule occasional scrubs, just in case you lost a > disk. But using RAID as a substitute for off-line > redundancy is hardly suitable in the kind of > archiving situations that you describe - and > therefore ZFS has absolutely nothing of value to > offer there: you should be using off-line copies, > and occasionally checking all copies for readability > (e.g., by copying them to the null device - again, > something you could do for your on-line copy with a > cron job and which you should do for your off-line > copy/copies once in a while as well. > > > > You have to detect the problem first.And I just described how to above - in a manner that also handles the off-line storage that you *should* be using for archival purposes (where ZFS scrubbing is useless). ZFS block> checksums will detect > problems that a simple read-only pass through most > other filesystems > will not detect.The only problems that ZFS will detect that a simple read-through pass will not are those that I just enumerated above: *extremely* rare undetectable bit-rot and real hardware (path-related) errors that studies like CERN''s have found to be very rare (like, none in the TB-sized installation under discussion here).> > > In sum, your support of ZFS in this specific area > seems very much knee-jerk in nature rather than > carefully thought out - exactly the kind of > ''over-hyping'' that I pointed out in my first post in > this thread. > > > > And your opposition to ZFS appears knee-jerk and > irrational, from this > end. But telling you that will have no beneficial > effect, any more than > what you just told me about how my opinions appear to > you. Couldn''t we > leave personalities out of this, in future?When someone appears to be arguing irrationally, it''s at least worth trying to straighten him out. But I''ll stop - *if* you start addressing the very specific and quantitative issues that you''ve been so assiduously skirting until now.> > > ... > > > > > >>>> And yet I know many people who have lost data in > >>>> > >> ways > >> > >>>> that ZFS would > >>>> have prevented. > >>>> > >>>> > >>> Specifics would be helpful here. How many? Can > they > >>> > >> reasonably be characterized as consumers (I''ll > remind > >> you once more: *that''s* the subject to which your > >> comments purport to be responding)? Can the data > loss > >> reasonably be characterized as significant (to > >> ''consumers'')? Were the causes hardware problems > that > >> could reasonably have been avoided (''bad cables'' > >> might translate to ''improperly inserted, overly > long, > >> or severely kinked cables'', for example - and such > a > >> poorly-constructed system will tend to have other > >> problems that ZFS cannot address)? > >> > >>> > >>> > >> "Reasonably avoided" is irrelevant; they *weren''t* > >> avoided. > >> > > > > While that observation has at least some merit, > I''ll observe that you jumped directly to the last of > my questions above while carefully ignoring the three > questions that preceded it. > > > > > > You''ll notice (since you responded) that I got to at > least one of them > by the end of the message.The only specific example you gave later was loss of an entire disk - something which ZFS does not handle significantly better than conventional RAID (for volatile data, and including scrubbing) and is a poorer choice for handling than off-line copies (for archival data). The statement of yours to which I was responding was "And yet I know many people who have lost data in ways that ZFS would have prevented." I suggest it was not unreasonable for me to have interpreted that sentence as including by implication " ... and that conventional storage arrangements would *not* have prevented", since otherwise it''s kind of pointless. If interpreted that way, your later example of loss of an entire disk does not qualify as any kind of answer to any of the questions that I posed (though it might have had at least tangential relevance if you had stated that the data loss had been not because the only copy disappeared but because it occurred on a RAID which could not be successfully rebuilt due to read errors on the surviving disks - even here, though, the question then becomes whether it was still possible to copy most of the data off the degraded array, which it usually should be). And you cut out of your> quotes what I > specifically said about cables; that''s cheating.Not at all: there was no need to repeat it (unless you think people generally don''t bother reading your own posts and would like others to try to help remedy that), because that was the one area in which you actually responded to what I had asked. ...> >> Nearly everybody I can think of who''s used a > computer > >> for more than a > >> couple of years has stories of stuff they''ve lost. > >> > > > > Of course they have - and usually in ways that ZFS > would have been no help whatsoever in mitigating. > > > > ZFS will help detect problems with marginal drives, > cables, power > supplies, controllers, memory, and motherboards.None of which the CERN study found in significant numbers (save for their RAID controller''s possible failure to report disk timeouts, but consumer systems - once again, the subject under discussion here - don''t use RAID controllers). The> block checksumming > will show corruptions earlier on, and less > ambiguously, giving you a > warning that there''s a problem to find and fix that > you mostly didn''t > get before.And mostly won''t get afterward either, because the incidence of such errors (especially after you eliminate those which conventional scrubbing will expose) is so low. You just can''t seem to grasp the fact that while this kind of error does occur, it occurs in such insignificant numbers that consumers *just won''t care*. So while ZFS is indeed ''better'' in this area, it''s just not *sufficiently* better to make any difference to consumers (sure, once in a while one may get hit with something that ZFS could have prevented, but for every such occurrence there''ll be hundreds or thousands of comparable problems that ZFS couldn''t help at all with). This is getting sufficiently tedious that I''m done with this part of the discussion unless you manage to respond with some actual substance. - bill This message posted from opensolaris.org
> > can you guess? wrote: > > ... > > > > Most of the balance of your post isn''t addressed > in > > any detail because it carefully avoids the > > fundamental issues that I raised: > > > > > > > Not true; and by selective quoting you have > removed > > my specific > > responses to most of these issues. > > While I''m naturally reluctant to call you an outright > liar, David, you have hardly so far in this > discussion impressed me as someone whose presentation > is so well-organized and responsive to specific > points that I can easily assume that I simply missed > those responses. If you happen to have a copy of > that earlier post, I''d like to see it resubmitted > (without modification).Oh, dear: I got one post/response pair out of phase with the above - the post which I claimed did not address the issues that I raised *is* present here (and indeed does not address them). I still won''t call you an outright liar: you''re obviously just *very* confused about what qualifies as responding to specific points. And, just for the record, if you do have a copy of the post that disappeared, I''d still like to see it.> > > > 1. How much visible damage does a single-bit > error > > actually do to the kind of large photographic > (e.g., > > RAW) file you are describing? If it trashes the > rest > > of the file, as you state is the case with jpeg, > then > > you might have a point (though you''d still have to > > address my second issue below), but if it results > in > > a virtually invisible blemish they you most > certainly > > don''t. > > > > > > > I addressed this quite specifically, for two cases > > (compressed raw vs. > > uncompressed raw) with different results. > > Then please do so where we all can see it.Especially since there''s no evidence of it in the post (still right here, up above) where you appear to be claiming that you did. - bill This message posted from opensolaris.org
Chill. It''s a filesystem. If you don''t like it, don''t use it. Sincere Regards, -Tim can you guess? wrote:>> can you guess? wrote: >> > > ... > > >>> Most of the balance of your post isn''t addressed in >>> >> any detail because it carefully avoids the >> fundamental issues that I raised: >> >>> >>> >> Not true; and by selective quoting you have removed >> my specific >> responses to most of these issues. >> > > While I''m naturally reluctant to call you an outright liar, David, you have hardly so far in this discussion impressed me as someone whose presentation is so well-organized and responsive to specific points that I can easily assume that I simply missed those responses. If you happen to have a copy of that earlier post, I''d like to see it resubmitted (without modification). > > >>> 1. How much visible damage does a single-bit error >>> >> actually do to the kind of large photographic (e.g., >> RAW) file you are describing? If it trashes the rest >> of the file, as you state is the case with jpeg, then >> you might have a point (though you''d still have to >> address my second issue below), but if it results in >> a virtually invisible blemish they you most certainly >> don''t. >> >>> >>> >> I addressed this quite specifically, for two cases >> (compressed raw vs. >> uncompressed raw) with different results. >> > > Then please do so where we all can see it. > > >>> 2. If you actually care about your data, you''d >>> >> have to be a fool to entrust it to *any* single copy, >> regardless of medium. And once you''ve got more than >> one copy, then you''re protected (at the cost of very >> minor redundancy restoration effort in the unlikely >> event that any problem occurs) against the loss of >> any one copy due to a minor error - the only loss of >> non-negligible likelihood that ZFS protects against >> better than other file systems. >> >>> >>> >> You have to detect the problem first. ZFS is in a >> much better position >> to detect the problem due to block checksums. >> > > Bulls***, to quote another poster here who has since been strangely quiet. The vast majority of what ZFS can detect (save for *extremely* rare undetectable bit-rot and for real hardware (path-related) errors that studies like CERN''s have found to be very rare - and you have yet to provide even anecdotal evidence to the contrary) can also be detected by scrubbing, and it''s arguably a lot easier to apply brute-force scrubbing (e.g., by scheduling a job that periodically copies your data to the null device if your system does not otherwise support the mechanism) than to switch your file system. > > >>> If you''re relying upon RAID to provide the multiple >>> >> copies - though this would also arguably be foolish, >> if only due to the potential for trashing all the >> copies simultaneously - you''d probably want to >> schedule occasional scrubs, just in case you lost a >> disk. But using RAID as a substitute for off-line >> redundancy is hardly suitable in the kind of >> archiving situations that you describe - and >> therefore ZFS has absolutely nothing of value to >> offer there: you should be using off-line copies, >> and occasionally checking all copies for readability >> (e.g., by copying them to the null device - again, >> something you could do for your on-line copy with a >> cron job and which you should do for your off-line >> copy/copies once in a while as well. >> >>> >>> >> You have to detect the problem first. >> > > And I just described how to above - in a manner that also handles the off-line storage that you *should* be using for archival purposes (where ZFS scrubbing is useless). > > ZFS block > >> checksums will detect >> problems that a simple read-only pass through most >> other filesystems >> will not detect. >> > > The only problems that ZFS will detect that a simple read-through pass will not are those that I just enumerated above: *extremely* rare undetectable bit-rot and real hardware (path-related) errors that studies like CERN''s have found to be very rare (like, none in the TB-sized installation under discussion here). > > >>> In sum, your support of ZFS in this specific area >>> >> seems very much knee-jerk in nature rather than >> carefully thought out - exactly the kind of >> ''over-hyping'' that I pointed out in my first post in >> this thread. >> >>> >>> >> And your opposition to ZFS appears knee-jerk and >> irrational, from this >> end. But telling you that will have no beneficial >> effect, any more than >> what you just told me about how my opinions appear to >> you. Couldn''t we >> leave personalities out of this, in future? >> > > When someone appears to be arguing irrationally, it''s at least worth trying to straighten him out. But I''ll stop - *if* you start addressing the very specific and quantitative issues that you''ve been so assiduously skirting until now. > > >>> ... >>> >>> >>> >>>>>> And yet I know many people who have lost data in >>>>>> >>>>>> >>>> ways >>>> >>>> >>>>>> that ZFS would >>>>>> have prevented. >>>>>> >>>>>> >>>>>> >>>>> Specifics would be helpful here. How many? Can >>>>> >> they >> >>>>> >>>>> >>>> reasonably be characterized as consumers (I''ll >>>> >> remind >> >>>> you once more: *that''s* the subject to which your >>>> comments purport to be responding)? Can the data >>>> >> loss >> >>>> reasonably be characterized as significant (to >>>> ''consumers'')? Were the causes hardware problems >>>> >> that >> >>>> could reasonably have been avoided (''bad cables'' >>>> might translate to ''improperly inserted, overly >>>> >> long, >> >>>> or severely kinked cables'', for example - and such >>>> >> a >> >>>> poorly-constructed system will tend to have other >>>> problems that ZFS cannot address)? >>>> >>>> >>>>> >>>>> >>>>> >>>> "Reasonably avoided" is irrelevant; they *weren''t* >>>> avoided. >>>> >>>> >>> While that observation has at least some merit, >>> >> I''ll observe that you jumped directly to the last of >> my questions above while carefully ignoring the three >> questions that preceded it. >> >>> >>> >> You''ll notice (since you responded) that I got to at >> least one of them >> by the end of the message. >> > > The only specific example you gave later was loss of an entire disk - something which ZFS does not handle significantly better than conventional RAID (for volatile data, and including scrubbing) and is a poorer choice for handling than off-line copies (for archival data). > > The statement of yours to which I was responding was "And yet I know many people who have lost data in ways that ZFS would have prevented." I suggest it was not unreasonable for me to have interpreted that sentence as including by implication " ... and that conventional storage arrangements would *not* have prevented", since otherwise it''s kind of pointless. If interpreted that way, your later example of loss of an entire disk does not qualify as any kind of answer to any of the questions that I posed (though it might have had at least tangential relevance if you had stated that the data loss had been not because the only copy disappeared but because it occurred on a RAID which could not be successfully rebuilt due to read errors on the surviving disks - even here, though, the question then becomes whether it was still possible to copy most of the data off the degraded array, which it usually should be). > > And you cut out of your > >> quotes what I >> specifically said about cables; that''s cheating. >> > > Not at all: there was no need to repeat it (unless you think people generally don''t bother reading your own posts and would like others to try to help remedy that), because that was the one area in which you actually responded to what I had asked. > > ... > > >>>> Nearly everybody I can think of who''s used a >>>> >> computer >> >>>> for more than a >>>> couple of years has stories of stuff they''ve lost. >>>> >>>> >>> Of course they have - and usually in ways that ZFS >>> >> would have been no help whatsoever in mitigating. >> >>> >>> >> ZFS will help detect problems with marginal drives, >> cables, power >> supplies, controllers, memory, and motherboards. >> > > None of which the CERN study found in significant numbers (save for their RAID controller''s possible failure to report disk timeouts, but consumer systems - once again, the subject under discussion here - don''t use RAID controllers). > > The > >> block checksumming >> will show corruptions earlier on, and less >> ambiguously, giving you a >> warning that there''s a problem to find and fix that >> you mostly didn''t >> get before. >> > > And mostly won''t get afterward either, because the incidence of such errors (especially after you eliminate those which conventional scrubbing will expose) is so low. You just can''t seem to grasp the fact that while this kind of error does occur, it occurs in such insignificant numbers that consumers *just won''t care*. So while ZFS is indeed ''better'' in this area, it''s just not *sufficiently* better to make any difference to consumers (sure, once in a while one may get hit with something that ZFS could have prevented, but for every such occurrence there''ll be hundreds or thousands of comparable problems that ZFS couldn''t help at all with). > > This is getting sufficiently tedious that I''m done with this part of the discussion unless you manage to respond with some actual substance. > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
> > Chill. It''s a filesystem. If you don''t like it, > don''t use it.Hey, I''m cool - it''s mid-November, after all. And it''s not about liking or not liking ZFS: it''s about actual merits vs. imagined ones, and about legitimate praise vs. illegitimate hype. Some of us have a professional interest in such things. If you don''t, by all means feel free to ignore the discussion. - bill This message posted from opensolaris.org
Hallelujah! I don''t know when this post actually appeared in the forum, but it wasn''t one I''d seen until right now. If it didn''t just appear due to whatever kind of fluke made the ''disappeared'' post appear right now too, I apologize for having missed it earlier.> In a compressed raw file, it''ll affect the rest of > the file generally; > so it essentially renders the whole thing useless, > unless it happens to > hit towards the end and you can crop around it. If > it hits in metadata > (statistically unlikely, the bulk of the file is > image data) it''s > probably at worst annoying, but it *might* hit one of > the bits software > uses to recognize and validate the file, too. > > In an uncompressed raw file, if it hits in image data > it''ll affect > probably 9 pixels; it''s easily fixed.That''s what I figured (and the above is the first time you''ve mentioned *compressed* RAW files, so the obvious next observation is that if they compress well - and if not, why bother compressing them? - then the amount of room that they occupy is significantly smaller and the likelihood of getting an error in one is similarly smaller). ...> > Even assuming that you meant ''MB'' rather than ''Mb'' > above, that suggests that it would take you well over > a decade to amass 1 TB of RAW data (assuming that, as > you suggest both above and later, you didn''t > accumulate several hundred MB of pictures *every* day > but just on those days when you were traveling, at a > sporting event, etc.). > > > > I seem to come up with a DVD full every month or two > these days, > myself. I mean, it varies; there was this one > weekend I filled 4 or > some such; but it varies both ways, and that average > isn''t too far > off. 25GB a year seems to take 40 years to reach > 1TB. However, my > rate has increased so dramatically in the last 7 > years that I''m not at > all sure what to expect; is it time for the curve to > level off yet, for > me? Who knows!Well, it still looks as if you''re taking well over a decade to fill 1 TB at present, as I estimated.> > Then again, I''m *also* working on scanning in the > *last* 40 years worth > of photos, and those tend to be bigger (scans are > less good pixels so > you need more of them), and *that* runs the numbers > up, in chunks when I > take time to do a big scanning batch.OK - that''s another new input, though not yet a quantitative one. ...> >> Even if you''ve got your original file archived, > you > >> still need > >> your working copies available, and Adobe Photoshop > >> can turn that > >> RAW file into a PSD of nearly 60Mb in some cases. > >> > > > > If you really amass all your pictures this way > (rather than, e.g., use Photoshop on some of them and > then save the result in a less verbose format), I''ll > suggest that this takes you well beyond the > ''consumer'' range of behavior. > > > > It''s not snapshot usage, but it''s common amateur > usage. Amateurs tend > to do lots of the same things professionals do (and > sometimes better, > though not usually). Hobbies are like that. > > The argument for the full Photoshop file is the > concept of > "nondestructive editing". I do retouching on new > layers instead of > erasing what I already have with the new stuff. I use > adjustment layers > with layer masks for curve adjustments. I can go > back and improve the > mask, or nudge the curves, without having to start > over from scratch. > It''s a huge win. And it may be more valuable for > amateurs, actually; > professionals tend to have the experience to know > their minds better and > know when they have it right, so many of them may do > less revisiting old > stuff and improving it a bit. Also, when the job is > done and sent to > the client, they tend not to care about it any more.OK - but at a *maximum* of 60 MB per shot you''re still talking about having to manually massage at least 20,000 shots in Photoshop before the result consumes 1 TB of space. That''s a *lot* of manual labor: do you really perform it on anything like that number of shots? - bill This message posted from opensolaris.org
... Having> my MP3 collection > gotten fucked up thanks to neither Windows nor NTFS > being able to > properly detect and report in-flight data corruption > (i.e. bad cable), > after copying it from one drive to another to replace > one of them, I''m > really glad that I''ve ZFS to manage my data these > days.Hmmm. All this talk about bad cables by you and others sounds more like older ATA (before transfers over the cable got CRC protection) than like contemporary drives. Was your experience with a recent drive and controller? ...> As far as all these reliability studies go, my > practical experience is > quite the opposite. I''m fixing computers of friends > and acquaintances > left and right, bad sectors are rather pretty common.I certainly haven''t found them to be common, unless a drive was on the verge of major failure. Though if a drive is used beyond its service life (usually 3 - 5 years) they may become more common. In any case, if a conventional scrub would detect the bad sector then ZFS per se wouldn''t add unique value (save that the check would be automated rather than something that the user, or system assembler, had to set up to be scheduled). I really meant it, though, when I said that I don''t completely discount anecdotal experience: I just like to get more particulars before deciding how much to weigh it against more formal analyses. - bill This message posted from opensolaris.org
> > On 9-Nov-07, at 3:23 PM, Scott Laird wrote: > > > Most video formats are designed to handle > errors--they''ll drop a frame > > or two, but they''ll resync quickly. So, depending > on the size of the > > error, there may be a visible glitch, but it''ll > keep working. > > > > Interestingly enough, this applies to a lot of > MPEG-derived formats as > > well, like MP3. I had a couple bad copies of MP3s > that I tried to > > listen to on my computer a few weeks ago (podcasts > copied via > > bluetooth off of my phone, apparently with no error > checking), and it > > made the story hard to follow when a few seconds > would disappear out > > of the middle, but it didn''t destroy the file. > > Well that''s nice. How about your database, your > source code, your ZIP > file, your encrypted file, ...They won''t be affected, because they''re so much smaller that (at something like 1 error per 10 TB) the chance of an error hitting them is negligible: that was the whole point of singling out huge video files as the only likely candidates to worry about. - bill This message posted from opensolaris.org
> > On 9-Nov-07, at 2:45 AM, can you guess? wrote:...> > This suggests that in a ZFS-style installation > without a hardware > > RAID controller they would have experienced at > worst a bit error > > about every 10^14 bits or 12 TB > > > And how about FAULTS? > hw/firmware/cable/controller/ram/...If you had read either the CERN study or what I already said about it, you would have realized that it included the effects of such faults. ...> > but I had a box that was randomly > >> corrupting blocks during > >> DMA. The errors showed up when doing a ZFS scrub > and > >> I caught the > >> problem in time. > > > > Yup - that''s exactly the kind of error that ZFS and > WAFL do a > > perhaps uniquely good job of catching. > > WAFL can''t catch all: It''s distantly isolated from > the CPU end.WAFL will catch everything that ZFS catches, including the kind of DMA error described above: it contains validating information outside the data blocks just as ZFS does. ...> > CERN was using relatively cheap disks > > Don''t forget every other component in the chain.I didn''t, and they didn''t: read the study. ...> > Your position is similar to that of an audiophile > enthused about a > > measurable but marginal increase in music quality > and trying to > > convince the hoi polloi that no other system will > do: while other > > audiophiles may agree with you, most people just > won''t consider it > > important - and in fact won''t even be able to > distinguish it at all. > > Data integrity *is* important.You clearly need to spend a lot more time trying to understand what you''ve read before responding to it. - bill This message posted from opensolaris.org
Just to note here as well as earlier that some of the confusion about what you had and had not said was related to my not having seen the post where you talked about RAW and compressed RAW errors until this morning. Since your other mysteriously ''disappeared'' post also appeared recently, I suspect that the RAW/compressed post was not present earlier when we were talking about its contents, but it is also possible that I just missed it. In any case, my response to you was based on your claim below ("by selective quoting") that this content had been in a post that I had responded to. - bill> > > can you guess? wrote: > > > > ... > > > > > > Most of the balance of your post isn''t > addressed > > in > > > any detail because it carefully avoids the > > > fundamental issues that I raised: > > > > > > > > > > Not true; and by selective quoting you have > > removed > > > my specific > > > responses to most of these issues. > > > > While I''m naturally reluctant to call you an > outright > > liar, David, you have hardly so far in this > > discussion impressed me as someone whose > presentation > > is so well-organized and responsive to specific > > points that I can easily assume that I simply > missed > > those responses. If you happen to have a copy of > > that earlier post, I''d like to see it resubmitted > > (without modification). > > Oh, dear: I got one post/response pair out of phase > with the above - the post which I claimed did not > address the issues that I raised *is* present here > (and indeed does not address them). > > I still won''t call you an outright liar: you''re > obviously just *very* confused about what qualifies > as responding to specific points. And, just for the > record, if you do have a copy of the post that > disappeared, I''d still like to see it. > > > > > > > 1. How much visible damage does a single-bit > > error > > > actually do to the kind of large photographic > > (e.g., > > > RAW) file you are describing? If it trashes the > > rest > > > of the file, as you state is the case with jpeg, > > then > > > you might have a point (though you''d still have > to > > > address my second issue below), but if it > results > > in > > > a virtually invisible blemish they you most > > certainly > > > don''t. > > > > > > > > > > I addressed this quite specifically, for two > cases > > > (compressed raw vs. > > > uncompressed raw) with different results. > > > > Then please do so where we all can see it. > > Especially since there''s no evidence of it in the > post (still right here, up above) where you appear to > be claiming that you did. > > - billThis message posted from opensolaris.org
No, you aren''t cool, and no it isn''t about zfs or your interest in it. It was clear from the get-go that netapp was paying you to troll any discussion on it, and to that end you''ve succeeded. Unfortunately you''ve done nothing but make yourself look like a pompous arrogant ass in every forum you''ve posted on to date. Yes, every point you''ve made could be refuted. No, nobody else is getting paid to post on forums, nor are they willing to go point for point with you again and again. Some of us have real jobs, families, and friends, and don''t have hours on end every day to spend arguing on the internet with paid trolls. I suggest getting a blog and ranting there, you have no audience here. This message posted from opensolaris.org
can you guess? wrote:> Hallelujah! I don''t know when this post actually appeared in the forum, but it wasn''t one I''d seen until right now. If it didn''t just appear due to whatever kind of fluke made the ''disappeared'' post appear right now too, I apologize for having missed it earlier. >I''m contributing to the email list associated with the forum, so there are the uncertainties of timing of email delivery, plus whatever process at the forum end goes on when new email arrives, before it will appear in the forum. It could be kind of variable I guess.>> In a compressed raw file, it''ll affect the rest of >> the file generally; >> so it essentially renders the whole thing useless, >> unless it happens to >> hit towards the end and you can crop around it. If >> it hits in metadata >> (statistically unlikely, the bulk of the file is >> image data) it''s >> probably at worst annoying, but it *might* hit one of >> the bits software >> uses to recognize and validate the file, too. >> >> In an uncompressed raw file, if it hits in image data >> it''ll affect >> probably 9 pixels; it''s easily fixed. >> > > That''s what I figured (and the above is the first time you''ve mentioned *compressed* RAW files, so the obvious next observation is that if they compress well - and if not, why bother compressing them? - then the amount of room that they occupy is significantly smaller and the likelihood of getting an error in one is similarly smaller). >The compression I see varies from something like 30% to 50%, very roughly (files reduced *by* 30%, not files reduced *to* 30%). This is with the Nikon D200, compressed NEF option. On some of the lower-level bodies, I believe the compression can''t be turned off. Smaller files will of course get hit less often -- or it''ll take longer to accumulate the terrabyte, is how I''d prefer to think of it. Damage that''s fixable is still damage; I think of this in archivist mindset, with the disadvantage of not having an external budget to be my own archivist.> ... > > >>> Even assuming that you meant ''MB'' rather than ''Mb'' >>> >> above, that suggests that it would take you well over >> a decade to amass 1 TB of RAW data (assuming that, as >> you suggest both above and later, you didn''t >> accumulate several hundred MB of pictures *every* day >> but just on those days when you were traveling, at a >> sporting event, etc.). >> >>> >>> >> I seem to come up with a DVD full every month or two >> these days, >> myself. I mean, it varies; there was this one >> weekend I filled 4 or >> some such; but it varies both ways, and that average >> isn''t too far >> off. 25GB a year seems to take 40 years to reach >> 1TB. However, my >> rate has increased so dramatically in the last 7 >> years that I''m not at >> all sure what to expect; is it time for the curve to >> level off yet, for >> me? Who knows! >> > > Well, it still looks as if you''re taking well over a decade to fill 1 TB at present, as I estimated. >Yes, I''m agreeing with you there.>> Then again, I''m *also* working on scanning in the >> *last* 40 years worth >> of photos, and those tend to be bigger (scans are >> less good pixels so >> you need more of them), and *that* runs the numbers >> up, in chunks when I >> take time to do a big scanning batch. >> > > OK - that''s another new input, though not yet a quantitative one. >I shot nearly as heavily back in the film era as I do now (within a factor of 2, let''s say); but when scanning existing processed photos, I exercise some selection about which ones to scan, so the file count on disk isn''t as large. (Much of my scanning was selective -- looking for photos for a class reunion, looking for photos of a dead friend; and the areas that have been more systematic were slides, which have been culled *twice* before scanning, and which were enough more expensive than B&W that I shot them more carefully. So I''m somewhat uncertain of the percentage I''d select in systematically going through the tri-x negatives). Very very roughly, I''m guessing I scan 1/4 of what I shot when I go through a roll systematically. And what I shot may have been as low as 1/2 as many as I''d shoot at the same events today. File sizes I choose for scanning 35mm range from 6MB to 150MB depending on film and purpose. So, very very roughly, an old scanned year might fill something like 1/4 * 1/2 * ( 30MB / 10MB) of the latest year (three eighths). There''s so much windage on so many of those numbers that we''re into WAG territory here. And other photographers will get very different numbers probably; they''ll make different choices at various stages.> ... > > >>>> Even if you''ve got your original file archived, >>>> >> you >> >>>> still need >>>> your working copies available, and Adobe Photoshop >>>> can turn that >>>> RAW file into a PSD of nearly 60Mb in some cases. >>>> >>>> >>> If you really amass all your pictures this way >>> >> (rather than, e.g., use Photoshop on some of them and >> then save the result in a less verbose format), I''ll >> suggest that this takes you well beyond the >> ''consumer'' range of behavior. >> >>> >>> >> It''s not snapshot usage, but it''s common amateur >> usage. Amateurs tend >> to do lots of the same things professionals do (and >> sometimes better, >> though not usually). Hobbies are like that. >> >> The argument for the full Photoshop file is the >> concept of >> "nondestructive editing". I do retouching on new >> layers instead of >> erasing what I already have with the new stuff. I use >> adjustment layers >> with layer masks for curve adjustments. I can go >> back and improve the >> mask, or nudge the curves, without having to start >> over from scratch. >> It''s a huge win. And it may be more valuable for >> amateurs, actually; >> professionals tend to have the experience to know >> their minds better and >> know when they have it right, so many of them may do >> less revisiting old >> stuff and improving it a bit. Also, when the job is >> done and sent to >> the client, they tend not to care about it any more. >> > > OK - but at a *maximum* of 60 MB per shot you''re still talking about having to manually massage at least 20,000 shots in Photoshop before the result consumes 1 TB of space. That''s a *lot* of manual labor: do you really perform it on anything like that number of shots? >Photoshop files don''t account for a significant amount of my total, no. As you suggest, I don''t do that kind of full-blown workup on a very large percentage of the photos. I was getting off-track (for this discussion), wanting to explain why you might save the Photoshop files, rather than jumping ahead to agreeing that the space involved wasn''t a big factor so it didn''t matter much for this discussion. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> No, you aren''t cool, and no it isn''t about zfs or > your interest in it. It was clear from the get-go > that netapp was paying you to troll any discussion on > it,It''s (quite literally) amazing how the most incompetent individuals turn out to be those who are the most certain of their misconceptions. In fact, there have been studies done that establish this as a statistically-significant trait among that portion of the population - so at least you aren''t alone in this respect. For the record, I have no connection with NetApp, I have never had any connection with NetApp (save for appreciating the elegance of their products), they never in any way asked me to take any part in any discussion on any subject whatsoever (let alone offered to pay me to do so), I don''t even *know* anyone at NetApp (at least that I''m aware of) save by professional reputation. In other words, you''ve got your head so far up your ass that you''re not only ready to make accusations that you do not (and in fact could not) have any evidence to support, you''re ready to make accusations that are factually flat wrong. Simply because an individual of your caliber apparently cannot conceive of the possibility that someone might take sufficient personal and professional interest in a topic to devote actual time and effort to attempting to cut through the hype that mostly well-meaning but less-than-objective and largely-uncritical supporters are shoveling out? Sheesh. ...> Yes, every point you''ve made could be refuted.Rather than drool about it, try taking an actual shot at doing so: though I''d normally consider talking with you to be a waste of my time, I''ll make an exception in this case. Call it a grudge match, if you want: I *really* don''t like the kind of incompetence that someone who behaves as you just did represents and also consider it something in the nature of a civic duty to expose if for what it is. ...> I suggest getting a blog and ranting there, you have > no audience here.Another demonstrably incorrect statement, I''m afraid: the contents of this thread make it clear that some people here, despite their preconceptions, do consider a detailed analysis of ZFS''s relative strengths to be a fit subject for discussion. And since it''s only human for them to resist changing those preconceptions, it''s hardly surprising that the discussion gets slightly heated at times. Education frequently can only occur through confrontation: existing biases make it difficult for milder forms to get through. I''d like to help people here learn something, but I''m at least equally interested in learning things myself - and since there are areas in which I consider ZFS''s design to be significantly sub-optimal, where better to test that opinion than here? Unfortunately, so far the discussion has largely bogged down in debate over just how important ZFS''s unique (save for WAFL) checksum protection mechanisms may be, and has not been very productive given the reluctance of many here to tackle that question quantitatively (though David eventually started to do so) - so there''s been very little opportunity for learning on my part save for a few details about media-file internals. I''m more interested in discussing things like whether my suggested fix for RAID-Z''s poor parallel-small-access performance overlooked some real obstacle, or why ZFS was presented as a highly-scalable file system when its largest files can require up to 6 levels of indirect blocks (making performance for random-access operations suck and causing snapshot data for updated large files to balloon) and it offers no obvious extension path to clustered operation (a single node - especially a single *commodity* node of the type that ZFS otherwise favors - runs out of steam in the PB range, or even lower for some workloads, and even breaking control out into a separate metadata server doesn''t get you that much farther), or whether ZFS''s apparently-centralized block-allocation mechanisms can scale well (using preallocation to predistribute large chunks that can be managed independently helps, but again doesn''t get you beyond the PB range at best), or the blind spot that some of the developers appear to have about the importance of on-disk contiguity for streaming access performance (128 KB chunks just don''t cut it in terms of efficient disk utilization in parallel environments unless they''re grouped together), or its trade-off of run-time performance and space use for performance when accessing snapshots (I''m guessing that it was more faith in the virtue of full-tree-path updating as compared with using a transaction log that actually caused that decision, so perhaps that''s the real subject for discussion). Of course, given that ZFS is what it is, there''s natural tendency just to plow forward and not ''waste time'' revisiting already-made decisions - so the people best able to discuss them may not want to. But you never know unless you try. - bill This message posted from opensolaris.org
In the previous and current responses, you seem quite determined of others misconceptions. Given that fact and the first paragraph of your response below, I think you can figure out why nobody on this list will reply to you again. can you guess? wrote:>> No, you aren''t cool, and no it isn''t about zfs or >> your interest in it. It was clear from the get-go >> that netapp was paying you to troll any discussion on >> it, >> > > It''s (quite literally) amazing how the most incompetent individuals turn out to be those who are the most certain of their misconceptions. In fact, there have been studies done that establish this as a statistically-significant trait among that portion of the population - so at least you aren''t alone in this respect. > > For the record, I have no connection with NetApp, I have never had any connection with NetApp (save for appreciating the elegance of their products), they never in any way asked me to take any part in any discussion on any subject whatsoever (let alone offered to pay me to do so), I don''t even *know* anyone at NetApp (at least that I''m aware of) save by professional reputation. In other words, you''ve got your head so far up your ass that you''re not only ready to make accusations that you do not (and in fact could not) have any evidence to support, you''re ready to make accusations that are factually flat wrong. > > Simply because an individual of your caliber apparently cannot conceive of the possibility that someone might take sufficient personal and professional interest in a topic to devote actual time and effort to attempting to cut through the hype that mostly well-meaning but less-than-objective and largely-uncritical supporters are shoveling out? Sheesh. > > ... > > >> Yes, every point you''ve made could be refuted. >> > > Rather than drool about it, try taking an actual shot at doing so: though I''d normally consider talking with you to be a waste of my time, I''ll make an exception in this case. Call it a grudge match, if you want: I *really* don''t like the kind of incompetence that someone who behaves as you just did represents and also consider it something in the nature of a civic duty to expose if for what it is. > > ... > > >> I suggest getting a blog and ranting there, you have >> no audience here. >> > > Another demonstrably incorrect statement, I''m afraid: the contents of this thread make it clear that some people here, despite their preconceptions, do consider a detailed analysis of ZFS''s relative strengths to be a fit subject for discussion. And since it''s only human for them to resist changing those preconceptions, it''s hardly surprising that the discussion gets slightly heated at times. > > Education frequently can only occur through confrontation: existing biases make it difficult for milder forms to get through. I''d like to help people here learn something, but I''m at least equally interested in learning things myself - and since there are areas in which I consider ZFS''s design to be significantly sub-optimal, where better to test that opinion than here? > > Unfortunately, so far the discussion has largely bogged down in debate over just how important ZFS''s unique (save for WAFL) checksum protection mechanisms may be, and has not been very productive given the reluctance of many here to tackle that question quantitatively (though David eventually started to do so) - so there''s been very little opportunity for learning on my part save for a few details about media-file internals. I''m more interested in discussing things like whether my suggested fix for RAID-Z''s poor parallel-small-access performance overlooked some real obstacle, or why ZFS was presented as a highly-scalable file system when its largest files can require up to 6 levels of indirect blocks (making performance for random-access operations suck and causing snapshot data for updated large files to balloon) and it offers no obvious extension path to clustered operation (a single node - especially a single *commodity* node of the type that ZFS otherwise favors - runs o > ut of steam in the PB range, or even lower for some workloads, and even breaking control out into a separate metadata server doesn''t get you that much farther), or whether ZFS''s apparently-centralized block-allocation mechanisms can scale well (using preallocation to predistribute large chunks that can be managed independently helps, but again doesn''t get you beyond the PB range at best), or the blind spot that some of the developers appear to have about the importance of on-disk contiguity for streaming access performance (128 KB chunks just don''t cut it in terms of efficient disk utilization in parallel environments unless they''re grouped together), or its trade-off of run-time performance and space use for performance when accessing snapshots (I''m guessing that it was more faith in the virtue of full-tree-path updating as compared with using a transaction log that actually caused that decision, so perhaps that''s the real subject for discussion). > > Of course, given that ZFS is what it is, there''s natural tendency just to plow forward and not ''waste time'' revisiting already-made decisions - so the people best able to discuss them may not want to. But you never know unless you try. > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
James C. McPherson wrote:> can you guess? wrote: > ... > >> Ah - thanks to both of you. My own knowledge of video format internals >> is so limited that I assumed most people here would be at least equally >> familiar with the notion that a flipped bit or two in a video would >> hardly qualify as any kind of disaster (or often even as being >> noticeable, unless one were searching for it, in the case of >> commercial-quality video). >> >> David''s comment about jpeg corruption would be more worrisome if it were >> clear that any significant number of ''consumers'' (the immediate subject >> of my original comment in this area) had anything approaching 1 TB of >> jpegs on their systems (which at an average of 1 MB per jpeg would be >> around a million pictures...). If you include ''image files of various >> sorts'', as he did (though this also raises the question of whether we''re >> still talking about ''consumers''), then you also have to specify exactly >> how damaging single-bit errors are to those various ''sorts'' (one might >> guess not very for the uncompressed formats that might well be taking up >> most of the space). And since the CERN study seems to suggest that the >> vast majority of errors likely to be encountered at this level of >> incidence (and which could be caught by ZFS) are *detectable* errors, >> they''ll (in the unlikely event that you encounter them at all) typically >> only result in requiring use of a RAID (or backup) copy (surely one >> wouldn''t be entrusting data of any real value to a single disk). >> > > > I have to comment here. As a bloke with a bit of a photography > habit - I have a 10Mpx camera and I shoot in RAW mode - it is > very, very easy to acquire 1Tb of image files in short order. > > Each of the photos I take is between 8 and 11Mb, and if I''m > at a sporting event or I''m travelling for work or pleasure, > it is *incredibly* easy to amass several hundred Mb of photos > every single day. > > I''m by no means a professional photographer (so I''m not out > taking photos every single day), although a very close friend > of mine is. My photo storage is protected by ZFS with mirroring > and backups to dvd media. My profotog friend has 3 copies of > all her data - working set, immediate copy on usb-attached disk, > and second backup also on usb-attached disk but disconnected. > > Even if you''ve got your original file archived, you still need > your working copies available, and Adobe Photoshop can turn that > RAW file into a PSD of nearly 60Mb in some cases. > > It is very easy for the storage medium to acquire some degree > of corruption - whether it''s a CF or SD card, they all use > FAT32. I have been in the position of losing photos due to > this. Not many - perhaps a dozen over the course of 12 months. > > That flipped bit which you seem to be dismissing as "hardly... > a disaster" can in fact make your photo file totally useless, > because not only will you probably not be able to get the file > off the media card, but whatever software you''re using to keep > track of your catalog will also be unable to show you the > entire contents. That might be the image itself, or it might > be the equally important EXIF information. > > I don''t depend on FAT32-formatted media cards to make my > living, fortunately, but if I did I imagine I''d probably end > up only using each card for about a month before exercising > caution and purchasing a new one rather than depending on the > card itself to be reliable any more. > > 1Tb of photos shot on a 10MPx camera in the camera''s native > RAW format is around 100,000 photos. It''s not difficult to > imagine a "consumer" having that sort of storage requirement. >Hi, I have been watching the thread, having been using a digital cameras for over 8 years I have noticed that as resolution increases the file size changes accordingly. The quality of the images has improved dramatically which is why newer camera are purchased. As I also have a project to scan all my parents and grandparents old photographs to provide a record for my children my disk usage has increased dramatically. Given that I purchased a digital video camera this has further added for the need of disk space. Previously I had two 73GB disks, now I have 750GB (4 x 250GB RAID5). The problem of archival is either to copy to another computer which I currently do, or hope that HVD becomes a cheap archival method.> > > James C. McPherson > -- > Senior Kernel Software Engineer, Solaris > Sun Microsystems > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ > >-- Regards Russell Email: russell dot aspinwall at flomerics dot co dot uk Network and Systems Administrator Flomerics Ltd Telephone: 020-8941-8810 x3116 81 Bridge Road Facsimile: 020-8941-8730 Hampton Court Surrey, KT8 9HH United Kingdom
> > In the previous and current responses, you seem quite > determined of > others misconceptions.I''m afraid that your sentence above cannot be parsed grammatically. If you meant that I *have* determined that some people here are suffering from various misconceptions, that''s correct. Given that fact and the first> paragraph of your > response below, I think you can figure out why nobody > on this list will > reply to you again.Predicting the future (especially the actions of others) is usually a feat reserved for psychics: are you claiming to be one (perhaps like the poster who found it ''clear'' that I was a paid NetApp troll - one of the aforementioned misconceptions)? Oh, well - what can one expect from someone who not only top-posts but completely fails to trim quotations? I see that you appear to be posting from a .edu domain, so perhaps next year you will at least mature to the point of becoming sophomoric. Whether people here find it sufficiently uncomfortable to have their beliefs (I''m almost tempted to say ''faith'', in some cases) challenged that they''ll indeed just shut up I really wouldn''t presume to guess. As for my own attitude, if you actually examine my responses rather than just go with your gut (which doesn''t seem to be a very reliable guide in your case) you''ll find that I tend to treat people pretty much as they deserve. If they don''t pay attention to what they''re purportedly responding to or misrepresent what I''ve said, I do chide them a bit (since I invariably *do* pay attention to what *they* say and make sincere efforts to respond to exactly that), and if they''re confrontational and/or derogatory then they''ll find me very much right back in their face. Perhaps it''s some kind of territorial thing - that people bridle when they find a seriously divergent viewpoint popping up in a cozy little community where most most of them have congregated because they already share the beliefs of the group. Such in-bred communities do provide a kind of sanctuary and feeling of belonging: perhaps it''s unrealistic to expect most people to be able to rise above that and deal rationally with the wider world''s entry into their little one. Or not: we''ll see. - bill This message posted from opensolaris.org
On Sat, Nov 10, 2007 at 02:05:04PM -0200, Toby Thain wrote:> > Yup - that''s exactly the kind of error that ZFS and WAFL do a > > perhaps uniquely good job of catching. > > WAFL can''t catch all: It''s distantly isolated from the CPU end.How so? The checksumming method is different from ZFS, but as far as I understand rather similar in capability. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
> > You have to detect the problem first. ZFS is in a > > much better position > > to detect the problem due to block checksums. > > Bulls***, to quote another poster here who has since been strangely quiet. > The vast majority of what ZFS can detect (save for *extremely* rare > undetectable bit-rot and for real hardware (path-related) errors that > studies like CERN''s have found to be very rare - and you have yet to > provide even anecdotal evidence to the contrary)You wanted anectodal evidence: During my personal experience with only two home machines, ZFS has helped me detect corruption at least three times in a period of a few months. One due to silent corruption due to a controller bug (and a driver that did not work around it). Another time corruption during hotswapping (though this does not necessarily count since I did it on hardware that I did not know was supposed to support it, and I would not have attempted it to begin with otherwise). Third time I don''t remember now. You may disregard it if you wish. In my professional life I have seen bitflips a few times in the middle of real live data running on "real" servers that are used for important data. As a result I have become pretty paranoid about it all, making heavy use of par2. (I have also seen various file system corruption / system instability issues that may very well be consistent with bit flips / other forms of corruption, but where there has been no proof of the underlying cause of the problems.)> can also be detected by > scrubbing, and it''s arguably a lot easier to apply brute-force scrubbing > (e.g., by scheduling a job that periodically copies your data to the null > device if your system does not otherwise support the mechanism) than to > switch your file system.How would your magic scrubbing detect arbitrary data corruption without checksumming or redundancy? A lot of the data people save does not have checksumming. Even if it does, the file system meta data typically does not. Nor does various minor information related to the data (let''s day the meta data associated with your backup of your other data, even if that data has some internal checksumming). I think one needs to stop making excuses by observing properties of specific file types and simlar. You can always use FEC to do error correction on arbitrary files if you really feel they are important. But the point is that with ZFS you get detection of *ANY* bit error for free (essentially), and optionally correction if you have redundancy. it doesn''t matter if it''s internal file system meta data, that important file you didn''t consider important from a corruption perspective, or in the middle of some larger file that you may or may not have applied FEC on otherwise. Even without fancy high-end requirements, it is nice to have some good statistical reason to believe that random corruption does not occurs. Even if only to drive your web browsers or E-Mail client; at least you can be sure that random bitflips (unless they either are undetected due to an implementation bug, or occurrs in memory/etc) is not the cause of your random application misbehavior. It''s like choosing RAM. You can make excuses all you want about doing proper testing, buying good RAM, or having redundancy at other levels etc - but you will still sleep better knowing you have ECC RAM than some random junk. Or let''s do the seat belt analogy. You can try to convince yourself/other people all you want that you are a safe driver, that you should not drive in a way that allows crashes or whatever else - but you are still going to be safer with a seat belt than without it. This is also why we care about fsync(). It doesn''t matter that you spent $100000 on that expensive server with redundant PSU:s hooked up to redundant UPS systems. *SHIT HAPPENS*, and when it does, you want to be maximally protected. Yes, ZFS is not perfect. But to me, both in the context of personal use and more serious use, ZFS is, barring some implementation details, more or less exactly what I have always wanted and solves pretty much all of the major problems with storage. And let me be clear: That is not hype. It''s ZFS actually providing what I have wanted, and what I knew I wanted even before ZFS (or WAFL or whatever else) was ever on my radar. For some reason some people seem to disagree. That''s your business. But the next time you have a power outtage, you''ll be sorry if you had a database that didn''t do fsync()[1], a filesystem that had no correction checking whatsoever[2], a RAID5 system that didn''t care about parity correctness in the face of a crash[3], and a filesystem or application whose data is not structured such that you can ascertain *what* is broken after the crash and what is not[4]. You will be even more sorry two years later when something really important malfunctioned as a result of undetected corruption two years earlier... [1] Because of course all serious players use proper UPS and a power outtage should never happen unless you suck. (This has actually been advocated to me. Seriously.) [2] Because of [1] and because of course you only run stable software that is well tested and will never be buggy. (This has been advocated. Seriously.) [3] Because of [1]. [4] Because of [1], [2] and [3]. -- / Peter Schuller PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: This is a digitally signed message part. URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071112/c65b30f7/attachment.bin>
Thanks for taking the time to flesh these points out. Comments below: ...> The compression I see varies from something like 30% > to 50%, very > roughly (files reduced *by* 30%, not files reduced > *to* 30%). This is > with the Nikon D200, compressed NEF option. On some > of the lower-level > bodies, I believe the compression can''t be turned > off. Smaller files > will of course get hit less often -- or it''ll take > longer to accumulate > the terrabyte, is how I''d prefer to think of it.Either viewpoint works. And since the compression is not that great, you still wind up consuming a lot of space. Effectively, you''re trading (at least if compression is an option rather than something that you''re stuck with) the possibility that a picture will become completely useless should a bit get flipped for a storage space reduction of 30% - 50% - and that''s a good trade, since it effectively allows you to maintain a complete backup copy on disk (for archiving, preferably off line) almost for free compared with the uncompressed option.> > Damage that''s fixable is still damage; I think of > this in archivist > mindset, with the disadvantage of not having an > external budget to be my > own archivist.There will *always* be the potential for damage, so the key is to make sure that any damage is easily fixable. The best way to do this is to a) keep multiple copies, b) keep them isolated from each other (that''s why RAID is not a suitable approach to archiving), and c) check (scrub) them periodically to ensure that if you lose a piece (whether a bit or a sector) you can restore the affected data from another copy and thus return your redundancy to full strength. For serious archiving, you probably want to maintain at least 3 such copies (possibly more if some are on media of questionable longevity). For normal use, there''s probably negligible risk of losing any data if you maintain only two on reasonably reliable media: ''MAID'' experience suggests that scrubbing as little as every few months reduces the likelihood of encountering detectable errors while restoring redundancy by several orders of magnitude (i.e., down to something like once in a PB at worst for disks - becoming comparable to the levels of bit-flip errors that the disk fails to detect at all). Which is what I''ve been getting at w.r.t. ZFS in this particular application (leaving aside whether it can reasonably be termed a ''consumer'' application - because bulk video storage is becoming one and it not only uses a similar amount of storage space but should probably be protected using similar strategies): unless you''re seriously worried about errors in the once-per-PB range, ZFS primarily just gives you automated (rather than manually-scheduled) scrubbing (and only for your on-line copy). Yes, it will help detect hardware faults as well if they happen to occur between RAM and the disk (and aren''t otherwise detected - I''d still like to know whether the ''bad cable'' experiences reported here occurred before ATA started CRCing its transfers), but while there''s anecdotal evidence of such problems presented here it doesn''t seem to be corroborated by the few actual studies that I''m familiar with, so that risk is difficult to quantify. Getting back to ''consumer'' use for a moment, though, given that something like 90% of consumers entrust their PC data to the tender mercies of Windows, and a large percentage of those neither back up their data, nor use RAID to guard against media failures, nor protect it effectively from the perils of Internet infection, it would seem difficult to assert that whatever additional protection ZFS may provide would make any noticeable difference in the consumer space - and that was the kind of reasoning behind my comment that began this sub-discussion. By George, we''ve managed to get around to having a substantive discussion after all: thanks for persisting until that occurred. - bill This message posted from opensolaris.org
Well, I guess we''re going to remain stuck in this sub-topic for a bit longer:> > The vast majority of what ZFS can detect (save for > *extremely* rare > > undetectable bit-rot and for real hardware > (path-related) errors that > > studies like CERN''s have found to be very rare - > and you have yet to > > provide even anecdotal evidence to the contrary) > > You wanted anectodal evidence:To be accurate, the above was not a solicitation for just any kind of anecdotal evidence but for anecdotal evidence that specifically contradicted the notion that otherwise undetected path-related hardware errors are ''very rare''. During my personal> experience with only two > home machines, ZFS has helped me detect corruption at > least three times in a > period of a few months. > > One due to silent corruption due to a controller bug > (and a driver that did > not work around it).If that experience occurred using what could be considered normal consumer hardware and software, that''s relevant (and disturbing). As I noted earlier, the only path-related problem that the CERN study unearthed involved their (hardly consumer-typical) use of RAID cards, the unusual demands that those cards placed on the WD disk firmware (to the point where it produced on-disk errors), and the cards'' failure to report accompanying disk time-outs.> > Another time corruption during hotswapping (though > this does not necessarily > count since I did it on hardware that I did not know > was supposed to support > it, and I would not have attempted it to begin with > otherwise).Using ZFS as a test platform to see whether you could get away with using hardware in a manner that it may not have been intended to be used may not really qualify as ''consumer'' use. As I''ve noted before, consumer relevance remains the point in question here (since that''s the point that fired off this lengthy sub-discussion). ...> In my professional life I have seen bitflips a few > times in the middle of real > live data running on "real" servers that are used for > important data. As a > result I have become pretty paranoid about it all, > making heavy use of par2.And well you should - but, again, that''s hardly ''consumer'' use. ...> > can also be detected by > > scrubbing, and it''s arguably a lot easier to apply > brute-force scrubbing > > (e.g., by scheduling a job that periodically copies > your data to the null > > device if your system does not otherwise support > the mechanism) than to > > switch your file system. > > How would your magic scrubbing detect arbitrary data > corruption without > checksummingThe assertion is that it would catch the large majority of errors that ZFS would catch (i.e., all the otherwise detectable errors, most of them detected by the disk when it attempts to read a sector), leaving a residue of no noticeable consequence to consumers (especially as one could make a reasonable case that most consumers would not experience any noticeable problem even if *none* of these errors were noticed).> or redundancy?Redundancy is necessary if you want to fix (not just catch) errors, but conventional mechanisms provide redundancy just as effective as ZFS''s. (With the minor exception of ZFS''s added metadata redundancy, but the likelihood that an error will happen to hit the relatively minuscule amount of metadata on a disk rather than the sea of data on it is, for consumers, certainly negligible, especially considering all the far more likely potential risks in the use of their PCs.)> > A lot of the data people save does not have > checksumming.*All* disk data is checksummed, right at the disk - and according to the studies I''m familiar with this detects most errors (certainly enough of those that ZFS also catches to satisfy most consumers). If you''ve got any quantitative evidence to the contrary, by all means present it. ...> I think one needs to stop making excuses by observing > properties of specific > file types and simlar.I''m afraid that''s incorrect: given the statistical incidence of the errors in question here, in normal consumer use only humongous files will ever experience them with non-neglible probability. So those are the kinds of files at issue. When such a file experiences one of these errors, then either it will be one that ZFS is uniquely (save for WAFL) capable of detecting, or it will be one that more conventional mechanisms can detect. The latter are, according to the studies I keep mentioning, far more frequent (only relatively, of course: we''re still only talking about one in every 10 TB or so, on average and according to manufacturers'' specs, which seem to be if anything pessimistic in this area), and comprise primarily unreadable disk sectors which (as long as they''re detected in a timely manner by scrubbing, whether ZFS''s or some manually-scheduled mechanism) simply require that the bad sector (or file) be replaced by a good copy to restore the desired level of redundancy. When we get into the realm of errors which are otherwise undetectable, we''re either talking about disk read errors in the once-per-PB range (and, if they''re single- or few-bit errors they won''t noticeably affect the video files which typically dominate consumer storage space use) or about the kinds of hardware errors that some people here have raised anecdotally but AFAIK haven''t come back to flesh out (e.g., after questions such as whether they occurred only before ATA started CRCing its transfers). Only the latter would seem to have any potential relevance to consumers, if indeed their incidence is non-negligible. You can always use FEC to do> error correction on > arbitrary files if you really feel they are > important. But the point is that > with ZFS you get detection of *ANY* bit error for > free (essentially),And the counterpoint is that while this is true it *just doesn''t matter* in normal consumer use, because the incidence of errors which would otherwise go undetected is negligible. ...> Even without fancy high-end requirements, it is nice > to have some good > statistical reason to believe that random corruption > does not occurs.And you''ve already got it without ZFS: all ZFS does is add a few more decimal places to an already neglibible (at least to consumers) risk. ...> It''s like choosing RAM. You can make excuses all you > want about doing proper > testing, buying good RAM, or having redundancy at > other levels etc - but you > will still sleep better knowing you have ECC RAM than > some random junk.No one is telling you not to do whatever it takes to help you sleep better: I''m just telling you that the comfort you attain thereby may not be strictly rational (i.e., commensurate with the actual effect of your action), so you should be careful about trying to apply that experience to others.> > Or let''s do the seat belt analogy. You can try to > convince yourself/other > people all you want that you are a safe driver, that > you should not drive in > a way that allows crashes or whatever else - but you > are still going to be > safer with a seat belt than without it.Indeed. And since studies have shown that if you are wearing a seat belt an airbag at best gives you very minimal additional protection (and in some cases might actually increase your risk), you really ought to stop telling people who already wear seat belts that they need airbags too.> > This is also why we care about fsync(). It doesn''t > matter that you spent > $100000 on that expensive server with redundant PSU:s > hooked up to redundant > UPS systems. *SHIT HAPPENS*, and when it does, you > want to be maximally > protected.You''re getting very far afield from consumer activities again, I''m afraid.> > Yes, ZFS is not perfect. But to me, both in the > context of personal use and > more serious use, ZFS is, barring some implementation > details, more or less > exactly what I have always wanted and solves pretty > much all of the major > problems with storage.That''s very nice for you: just (as I noted above) don''t presume to apply your personal fetishes (including what may constitute a ''major'' consumer storage problem) to everyone else.> > And let me be clear: That is not hype. It''s ZFS > actually providing what I have > wanted, and what I knew I wanted even before ZFS (or > WAFL or whatever else) > was ever on my radar.Having just dealt with that fairly bluntly above, let me state here that the same is true for me: that''s why I was working on almost exactly the same kind of checksumming before I ever heard of ZFS (or knew that WAFL already had it). The difference is that I understand *quantitatively* how important it is - both to installations with serious reliability requirements (where it''s a legitimate selling point, though not necessarily a dominant one save for pretty unusual installations) and in consumer use (where it''s not).> > For some reason some people seem to disagree.Perhaps I''ve now managed to make it clear exactly where that disagreement lies, if it wasn''t clear before. - bill This message posted from opensolaris.org
some business do not accept any kind of risk and hence will try hard (i.e spend a lot of money) to eliminate it (create 2, 3, 4 copies, read-verify, cksum...) at the moment only ZFS can give this assurance, plus the ability to self correct detected errors. It''s a good things that ZFS can help people store and manage safely their jpgs on their usb disk..the real target customers here are companies that rely a lot on their data: their goal is create value out of it. It is the case for CERN (corrupted file might imply a missed higgs particule) and any mature company as matter of fact (finance, governement). these are the business ZFS gives a real data storage assurance Selim -- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/ On Nov 13, 2007 12:53 AM, can you guess? <billtodd at metrocast.net> wrote:> Thanks for taking the time to flesh these points out. Comments below: > > ... > > > The compression I see varies from something like 30% > > to 50%, very > > roughly (files reduced *by* 30%, not files reduced > > *to* 30%). This is > > with the Nikon D200, compressed NEF option. On some > > of the lower-level > > bodies, I believe the compression can''t be turned > > off. Smaller files > > will of course get hit less often -- or it''ll take > > longer to accumulate > > the terrabyte, is how I''d prefer to think of it. > > Either viewpoint works. And since the compression is not that great, you still wind up consuming a lot of space. Effectively, you''re trading (at least if compression is an option rather than something that you''re stuck with) the possibility that a picture will become completely useless should a bit get flipped for a storage space reduction of 30% - 50% - and that''s a good trade, since it effectively allows you to maintain a complete backup copy on disk (for archiving, preferably off line) almost for free compared with the uncompressed option. > > > > > Damage that''s fixable is still damage; I think of > > this in archivist > > mindset, with the disadvantage of not having an > > external budget to be my > > own archivist. > > There will *always* be the potential for damage, so the key is to make sure that any damage is easily fixable. The best way to do this is to a) keep multiple copies, b) keep them isolated from each other (that''s why RAID is not a suitable approach to archiving), and c) check (scrub) them periodically to ensure that if you lose a piece (whether a bit or a sector) you can restore the affected data from another copy and thus return your redundancy to full strength. > > For serious archiving, you probably want to maintain at least 3 such copies (possibly more if some are on media of questionable longevity). For normal use, there''s probably negligible risk of losing any data if you maintain only two on reasonably reliable media: ''MAID'' experience suggests that scrubbing as little as every few months reduces the likelihood of encountering detectable errors while restoring redundancy by several orders of magnitude (i.e., down to something like once in a PB at worst for disks - becoming comparable to the levels of bit-flip errors that the disk fails to detect at all). > > Which is what I''ve been getting at w.r.t. ZFS in this particular application (leaving aside whether it can reasonably be termed a ''consumer'' application - because bulk video storage is becoming one and it not only uses a similar amount of storage space but should probably be protected using similar strategies): unless you''re seriously worried about errors in the once-per-PB range, ZFS primarily just gives you automated (rather than manually-scheduled) scrubbing (and only for your on-line copy). Yes, it will help detect hardware faults as well if they happen to occur between RAM and the disk (and aren''t otherwise detected - I''d still like to know whether the ''bad cable'' experiences reported here occurred before ATA started CRCing its transfers), but while there''s anecdotal evidence of such problems presented here it doesn''t seem to be corroborated by the few actual studies that I''m familiar with, so that risk is difficult to quantify. > > Getting back to ''consumer'' use for a moment, though, given that something like 90% of consumers entrust their PC data to the tender mercies of Windows, and a large percentage of those neither back up their data, nor use RAID to guard against media failures, nor protect it effectively from the perils of Internet infection, it would seem difficult to assert that whatever additional protection ZFS may provide would make any noticeable difference in the consumer space - and that was the kind of reasoning behind my comment that began this sub-discussion. > > By George, we''ve managed to get around to having a substantive discussion after all: thanks for persisting until that occurred. > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
can you guess? wrote:> Vitesse VSC410> Yes, it will help detect > hardware faults as well if they happen to occur between RAM and the > disk (and aren''t otherwise detected - I''d still like to know whether > the ''bad cable'' experiences reported here occurred before ATA started > CRCing its transfers), but while there''s anecdotal evidence of such > problems presented here it doesn''t seem to be corroborated by the few > actual studies that I''m familiar with, so that risk is difficult to > quantify.It may not have been a bad cable and it is a cheap highpoint card but I was running the card in RAID0 and getting random corrupted bytes on reads that went away when I switched to JBOD. The data was fine on disk but I would get a corrupted byte every 250-500MB and the only reason I noticed was because I was using Unison to sync folders and it kept reporting differences I knew shouldn''t exist. So "bad cable" type things do happen and ZFS probably would have helped me notice it sooner. If I hadn''t had another copy of the data I may have been able to still recover it but only because most of the files where 1-1.5MB jpegs and the errors moved around so I could have just copied a file repeatedly until I got a good copy but that would have been a lot of work. Jonathan
On 11-Nov-07, at 10:19 AM, can you guess? wrote:>> >> On 9-Nov-07, at 2:45 AM, can you guess? wrote: > > ... > >>> This suggests that in a ZFS-style installation >> without a hardware >>> RAID controller they would have experienced at >> worst a bit error >>> about every 10^14 bits or 12 TB >> >> >> And how about FAULTS? >> hw/firmware/cable/controller/ram/... > > If you had read either the CERN study or what I already said about > it, you would have realized that it included the effects of such > faults....and ZFS is the only prophylactic available.> > ... > >>> but I had a box that was randomly >>>> corrupting blocks during >>>> DMA. The errors showed up when doing a ZFS scrub >> and >>>> I caught the >>>> problem in time. >>> >>> Yup - that''s exactly the kind of error that ZFS and >> WAFL do a >>> perhaps uniquely good job of catching. >> >> WAFL can''t catch all: It''s distantly isolated from >> the CPU end. > > WAFL will catch everything that ZFS catches, including the kind of > DMA error described above: it contains validating information > outside the data blocks just as ZFS does.Explain how it can do that, when it is isolated from the application by several layers including the network? --Toby> > ... > >>> CERN was using relatively cheap disks >> >> Don''t forget every other component in the chain. > > I didn''t, and they didn''t: read the study. > > ... > >>> Your position is similar to that of an audiophile >> enthused about a >>> measurable but marginal increase in music quality >> and trying to >>> convince the hoi polloi that no other system will >> do: while other >>> audiophiles may agree with you, most people just >> won''t consider it >>> important - and in fact won''t even be able to >> distinguish it at all. >> >> Data integrity *is* important. > > You clearly need to spend a lot more time trying to understand what > you''ve read before responding to it. > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Nov 13, 2007 at 07:33:20PM -0200, Toby Thain wrote:> >>> Yup - that''s exactly the kind of error that ZFS and > >> WAFL do a > >>> perhaps uniquely good job of catching. > >> > >> WAFL can''t catch all: It''s distantly isolated from > >> the CPU end. > > > > WAFL will catch everything that ZFS catches, including the kind of > > DMA error described above: it contains validating information > > outside the data blocks just as ZFS does. > > Explain how it can do that, when it is isolated from the application > by several layers including the network?Ah, your "CPU end" was referring to the NFS client cpu, not the storage device CPU. That wasn''t clear to me. The same limitations would apply to ZFS (or any other filesystem) when running in support of an NFS server. I thought you were trying to describe a qualitative difference between ZFS and WAFL in terms of data checksumming in the on-disk layout. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Hi Darren,> Ah, your "CPU end" was referring to the NFS client cpu, not the storage > device CPU. That wasn''t clear to me. The same limitations would apply > to ZFS (or any other filesystem) when running in support of an NFS > server. > > I thought you were trying to describe a qualitative difference between > ZFS and WAFL in terms of data checksumming in the on-disk layout.Eh...NetApp can just open WAFL to neuter the argument... ;-) Or I suppose you could just run ZFS on top of an iSCSI or FC mount from the NetApp. The problem it seems to me with criticizing ZFS as not much different than WAFL, is that WAFL is really a networked storage backend, not a server operating system FS. If all you''re using ZFS for is backending networked storage, the "not much different" criticism holds a fair amount of water I think. However, that highlights what''s special about ZFS...it isn''t limited to just that use case. Its the first server OS FS (to my knowledge) to provide all those features in one place, and that''s what makes it revolutionary. Because you can truly use its features in any application with any storage. Its on that basis I think that placing ZFS and WAFL on equal footing is not a strong argument. Best Regards, Jason
...> >> And how about FAULTS? > >> hw/firmware/cable/controller/ram/... > > > > If you had read either the CERN study or what I > already said about > > it, you would have realized that it included the > effects of such > > faults. > > > ...and ZFS is the only prophylactic available.You don''t *need* a prophylactic if you''re not having sex: the CERN study found *no* clear instances of faults that would occur in consumer systems and that could be attributed to the kinds of errors that ZFS can catch and more conventional file systems can''t. It found faults in the interaction of its add-on RAID controller (not a normal ''consumer'' component) with its WD disks, it found single-bit errors that appeared to correlate with ECC RAM errors (i.e., likely occurred in RAM rather than at any point where ZFS would be involved), it found block-sized errors that appeared to correlate with misplaced virtual memory allocation (again, outside ZFS''s sphere of influence).> > > > > > ... > > > >>> but I had a box that was randomly > >>>> corrupting blocks during > >>>> DMA. The errors showed up when doing a ZFS > scrub > >> and > >>>> I caught the > >>>> problem in time. > >>> > >>> Yup - that''s exactly the kind of error that ZFS > and > >> WAFL do a > >>> perhaps uniquely good job of catching. > >> > >> WAFL can''t catch all: It''s distantly isolated from > >> the CPU end. > > > > WAFL will catch everything that ZFS catches, > including the kind of > > DMA error described above: it contains validating > information > > outside the data blocks just as ZFS does. > > Explain how it can do that, when it is isolated from > the application > by several layers including the network?Darrell covered one aspect of this (i.e., that ZFS couldn''t either if it were being used in a server), but there''s another as well: as long as the NFS messages between client RAM and server RAM are checksummed in RAM on both ends, then that extends the checking all the way to client RAM (the same place where local ZFS checks end) save for any problems occurring *in* RAM at one end or the other (and ZFS can''t deal with in-RAM problems either: all it can do is protect the data until it gets to RAM). - bill This message posted from opensolaris.org
> DarrellMy apologies, Darren. - bill This message posted from opensolaris.org
> some business do not accept any kind of riskBusinesses *always* accept risk: they just try to minimize it within the constraints of being cost-effective. Which is a good thing for ZFS, because it can''t eliminate risk either, just help to minimize it cost-effectively. However, the subject here is not business use but ''consumer'' use. ...> at the moment only ZFS can give this assurance, plus > the ability to > self correct detected > errors.You clearly aren''t very familiar with WAFL (which can do the same). - bill This message posted from opensolaris.org
... I> was running the card in RAID0 and getting random > corrupted bytes on > reads that went away when I switched to JBOD.Then it kind of sounds like a card problem rather than a cable problem. Perhaps there''s a very basic definition issue here: when I use the term ''consumer'', I''m referring to the people who buy a computer and never open up the case, not to people who fool around with RAID cards. I''m referring to people who would likely say "What?" if you referred to Linux. I''m referring to people who would be extremely unlikely to be found participating in this forum. In other words, to the overwhelming majority of PC users, who don''t want to hear anything that suggests that they might have to become more intimately involved with their computer in order to make it better (let alone ''better'' in the relatively marginal and fairly abstruse ways that ZFS would). I''d include most Mac users as well, except that they''ve just suffered a major disruption to their world-view by being told that moving to the previously-despised Intel platform constitutes an *upgrade* - so if you can give them any excuse to think that ZFS is superior (and not available on Windows) they''ll likely grab for it like desperate voyagers on the Titanic grabbed for life savers (hey, Steve''s no dummy). People like you and me with somewhat more knowledge about computers are like airline employees who tend to choose their seats with an eye toward crash survivability: no, this probably won''t mean they''ll survive a crash, but it makes them feel better to be doing the little that they can. And they just accept the fact that the rest of the world would prefer not to think that they had to worry about crashes at all (if they thought otherwise, a lot more planes would be built with their seats facing backward). - bill This message posted from opensolaris.org
can you guess? wrote:>> at the moment only ZFS can give this assurance, plus >> the ability to >> self correct detected >> errors. >> > > You clearly aren''t very familiar with WAFL (which can do the same). > >That''s quite possibly a factor. I''m pretty thoroghly unfamiliar with WAFL myself, though I think I''ve probably used it via NFS in a work environment or two. In any case, so far as I can tell it''s quite irrelevant to me at home; I can''t afford it. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
can you guess? wrote:> ... > I > >> was running the card in RAID0 and getting random >> corrupted bytes on >> reads that went away when I switched to JBOD. >> > > Then it kind of sounds like a card problem rather than a cable problem. > > Perhaps there''s a very basic definition issue here: when I use the term ''consumer'', I''m referring to the people who buy a computer and never open up the case, not to people who fool around with RAID cards. I''m referring to people who would likely say "What?" if you referred to Linux. I''m referring to people who would be extremely unlikely to be found participating in this forum. > > In other words, to the overwhelming majority of PC users, who don''t want to hear anything that suggests that they might have to become more intimately involved with their computer in order to make it better (let alone ''better'' in the relatively marginal and fairly abstruse ways that ZFS would). >Statistically there are a lot of them. But I''ve known lots of early-adopters with no professional computer background (one of them has *since then* done some tech-writing work) who built machines from parts, replaced motherboards, upgraded the processors in "non-upgradeable" MACs, ran OS/2, even converted themselves to Linux, on their home systems. These are consumer users too.> I''d include most Mac users as well, except that they''ve just suffered a major disruption to their world-view by being told that moving to the previously-despised Intel platform constitutes an *upgrade* - so if you can give them any excuse to think that ZFS is superior (and not available on Windows) they''ll likely grab for it like desperate voyagers on the Titanic grabbed for life savers (hey, Steve''s no dummy). >Long-time MAC users must be getting used to having their entire world disrupted and having to re-buy all their software. This is at least the second complete flag-day (no forward or backwards compatibility) change they''ve been through. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> can you guess? wrote: > > >> at the moment only ZFS can give this assurance, > plus > >> the ability to > >> self correct detected > >> errors. > >> > > > > You clearly aren''t very familiar with WAFL (which > can do the same). > > > >... so far as I can tell it''s quite> irrelevant to me at home; I > can''t afford it.Neither can I - but the poster above was (however irrelevantly) talking about ZFS''s supposedly unique features for *businesses*, so I answered in that context. (By the way, something has gone West with my email and I''m temporarily unable to send the response I wrote to your message night before last. If you meant to copy it here as well, just do so and I''ll respond to it here.) - bill This message posted from opensolaris.org
> > On 9-Nov-07, at 2:45 AM, can you guess? wrote: > > >>> Au contraire: I estimate its worth quite > >> accurately from the undetected error rates reported > >> in the CERN "Data Integrity" paper published last > >> April (first hit if you Google ''cern "data > >> integrity"''). > >>> > >>>> While I have yet to see any checksum error > >> reported > >>>> by ZFS on > >>>> Symmetrix arrays or FC/SAS arrays with some other > >>>> "cheap" HW I''ve seen > >>>> many of them > >>> > >>> While one can never properly diagnose anecdotal > >> issues off the cuff in a Web forum, given CERN''s > >> experience you should probably check your > >> configuration very thoroughly for things like > >> marginal connections: unless you''re dealing with a > >> far larger data set than CERN was, you shouldn''t have > >> seen ''many'' checksum errors. > >> > >> Well single bit error rates may be rare in normal > >> operation hard > >> drives, but from a systems perspective, data can be > >> corrupted anywhere > >> between disk and CPU. > > > > The CERN study found that such errors (if they found any at all, > > which they couldn''t really be sure of) were far less common thanI will note from multiple personal experiences these issues _do_ happen with netapp and emc (symm and clariion) -- I will also say that many times you do not read about them because you will find that when they do happen to you one of the first people to show up on your site will be their legal team pushing paper and sharking for signatures. -Wade
On 13-Nov-07, at 9:18 PM, A Darren Dunham wrote:> On Tue, Nov 13, 2007 at 07:33:20PM -0200, Toby Thain wrote: >>>>> Yup - that''s exactly the kind of error that ZFS and >>>> WAFL do a >>>>> perhaps uniquely good job of catching. >>>> >>>> WAFL can''t catch all: It''s distantly isolated from >>>> the CPU end. >>> >>> WAFL will catch everything that ZFS catches, including the kind of >>> DMA error described above: it contains validating information >>> outside the data blocks just as ZFS does. >> >> Explain how it can do that, when it is isolated from the application >> by several layers including the network? > > Ah, your "CPU end" was referring to the NFS client cpu, not the > storage > device CPU. That wasn''t clear to me. The same limitations would > apply > to ZFS (or any other filesystem) when running in support of an NFS > server. > > I thought you were trying to describe a qualitative difference between > ZFS and WAFL in terms of data checksumming in the on-disk layout.Yes, I was comparing apples and oranges, as our mysterious friend will be sure to point out. But I still don''t think WAFL and ZFS are interchangeable, because if you *really* care about integrity you won''t choose an isolated storage subsystem - and does anyone run WAFL on the application host? --Toby> > -- > Darren Dunham > ddunham at taos.com > Senior Technical Consultant TAOS http:// > www.taos.com/ > Got some Dr Pepper? San Francisco, CA bay > area > < This line left intentionally blank to confuse you. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 14-Nov-07, at 12:43 AM, Jason J. W. Williams wrote:> Hi Darren, > >> Ah, your "CPU end" was referring to the NFS client cpu, not the >> storage >> device CPU. That wasn''t clear to me. The same limitations would >> apply >> to ZFS (or any other filesystem) when running in support of an NFS >> server. >> >> I thought you were trying to describe a qualitative difference >> between >> ZFS and WAFL in terms of data checksumming in the on-disk layout. > > Eh...NetApp can just open WAFL to neuter the argument... ;-) Or I > suppose you could just run ZFS on top of an iSCSI or FC mount from the > NetApp. > > The problem it seems to me with criticizing ZFS as not much different > than WAFL, is that WAFL is really a networked storage backend, not a > server operating system FS. If all you''re using ZFS for is backending > networked storage, the "not much different" criticism holds a fair > amount of water I think. However, that highlights what''s special about > ZFS...it isn''t limited to just that use case. Its the first server OS > FS (to my knowledge) to provide all those features in one place, and > that''s what makes it revolutionary. Because you can truly use its > features in any application with any storage. Its on that basis I > think that placing ZFS and WAFL on equal footing is not a strong > argument.That was my thinking, and better put than I could, thankyou. --Toby> > Best Regards, > Jason > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 14-Nov-07, at 7:06 AM, can you guess? wrote:> ... > >>>> And how about FAULTS? >>>> hw/firmware/cable/controller/ram/... >>> >>> If you had read either the CERN study or what I >> already said about >>> it, you would have realized that it included the >> effects of such >>> faults. >> >> >> ...and ZFS is the only prophylactic available. > > You don''t *need* a prophylactic if you''re not having sex: the CERN > study found *no* clear instances of faults that would occur in > consumer systems and that could be attributed to the kinds of > errors that ZFS can catch and more conventional file systems can''t.Hmm, that''s odd, because I''ve certainly had such faults myself. (Bad RAM is a very common one, that nobody even thinks to check.) --Toby> It found faults in the interaction of its add-on RAID controller > (not a normal ''consumer'' component) with its WD disks, it found > single-bit errors that appeared to correlate with ECC RAM errors > (i.e., likely occurred in RAM rather than at any point where ZFS > would be involved), it found block-sized errors that appeared to > correlate with misplaced virtual memory allocation (again, outside > ZFS''s sphere of influence). > >> >> >>> >>> ... >>> >>>>> but I had a box that was randomly >>>>>> corrupting blocks during >>>>>> DMA. The errors showed up when doing a ZFS >> scrub >>>> and >>>>>> I caught the >>>>>> problem in time. >>>>> >>>>> Yup - that''s exactly the kind of error that ZFS >> and >>>> WAFL do a >>>>> perhaps uniquely good job of catching. >>>> >>>> WAFL can''t catch all: It''s distantly isolated from >>>> the CPU end. >>> >>> WAFL will catch everything that ZFS catches, >> including the kind of >>> DMA error described above: it contains validating >> information >>> outside the data blocks just as ZFS does. >> >> Explain how it can do that, when it is isolated from >> the application >> by several layers including the network? > > Darrell covered one aspect of this (i.e., that ZFS couldn''t either > if it were being used in a server), but there''s another as well: > as long as the NFS messages between client RAM and server RAM are > checksummed in RAM on both ends, then that extends the checking all > the way to client RAM (the same place where local ZFS checks end) > save for any problems occurring *in* RAM at one end or the other > (and ZFS can''t deal with in-RAM problems either: all it can do is > protect the data until it gets to RAM). > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
...> The problem it seems to me with criticizing ZFS as > not much different > than WAFL, is that WAFL is really a networked storage > backend, not a > server operating system FS. If all you''re using ZFS > for is backending > networked storage, the "not much different" criticism > holds a fair > amount of water I think.A more fundamental problem is that there are several different debates going on in this one thread. The comparison with WAFL is primarily about the question of just how ''novel'' ZFS''s design is (leaving aside any questions about patent enforceability) and especially about just how ''unique'' its reliability approaches are for environments that require them. In a nutshell, while some COW approaches predate both WAFL and ZFS, WAFL was arguably the first to come up with the kind of ''write anywhere'' approach that ZFS also heavily relies upon and to the best of my knowledge WAFL was also the first to incorporate the kind of in-parent-verification that has played such a prominent part in the integrity discussion here. Another prominent debate in this thread revolves around the question of just how significant ZFS''s unusual strengths are for *consumer* use. WAFL clearly plays no part in that debate, because it''s available only on closed, server systems. However, that highlights> what''s special about > ZFS...it isn''t limited to just that use case.The major difference between ZFS and WAFL in this regard is that ZFS batch-writes-back its data to disk without first aggregating it in NVRAM (a subsidiary difference is that ZFS maintains a small-update log which WAFL''s use of NVRAM makes unnecessary). Decoupling the implementation from NVRAM makes ZFS usable on arbitrary rather than specialized platforms, and that without doubt constitutes a significant advantage by increasing the available options (in both platform and price) for those installations that require the kind of protection (and ease of management) that both WAFL and ZFS offer and that don''t require the level of performance that WAFL provides and ZFS often may not (the latter hasn''t gotten much air time here, and while it can be discussed to some degree in the abstract a better approach would be to have some impartial benchmarks to look at, because the on-disk block layouts do differ significantly and sometimes subtly even if the underlying approaches don''t). - bill This message posted from opensolaris.org
> > On 14-Nov-07, at 7:06 AM, can you guess? wrote: > > > ... > > > >>>> And how about FAULTS? > >>>> hw/firmware/cable/controller/ram/... > >>> > >>> If you had read either the CERN study or what I > >> already said about > >>> it, you would have realized that it included the > >> effects of such > >>> faults. > >> > >> > >> ...and ZFS is the only prophylactic available. > > > > You don''t *need* a prophylactic if you''re not > having sex: the CERN > > study found *no* clear instances of faults that > would occur in > > consumer systems and that could be attributed to > the kinds of > > errors that ZFS can catch and more conventional > file systems can''t. > > Hmm, that''s odd, because I''ve certainly had such > faults myself. (Bad > RAM is a very common one,You really ought to read a post before responding to it: the CERN study did encounter bad RAM (and my post mentioned that) - but ZFS usually can''t do a damn thing about bad RAM, because errors tend to arise either before ZFS ever gets the data or after it has already returned and checked it (and in both cases, ZFS will think that everything''s just fine). that nobody even thinks to> check.)Speak for yourself: I''ve run memtest86+ on all our home systems, and I run it again whenever encountering any problem that might be RAM-related. - bill This message posted from opensolaris.org
...> > >> Well single bit error rates may be rare in > normal > > >> operation hard > > >> drives, but from a systems perspective, data can > be > > >> corrupted anywhere > > >> between disk and CPU. > > > > > > The CERN study found that such errors (if they > found any at all, > > > which they couldn''t really be sure of) were far > less common than > > I will note from multiple personal experiences these > issues _do_ happen > with netapp and emc (symm and clariion)And Robert already noted that they''ve occurred in his mid-range arrays. In both cases, however, you''re talking about decidedly non-consumer hardware, and had you looked more carefully at the material to which you were responding you would have found that its comments were in the context of experiences with consumer hardware (and in particular what *quantitative* level of additional protection ZFS''s ''special sauce'' can be considered to add to its reliability). Errors introduced by mid-range and high-end arrays don''t enter into that discussion (though they''re interesting for other reasons). - bill This message posted from opensolaris.org
On Thu, Nov 08, 2007 at 07:28:47PM -0800, can you guess? wrote:> > How so? In my opinion, it seems like a cure for the brain damage of RAID-5. > > Nope. > > A decent RAID-5 hardware implementation has no ''write hole'' to worry about, and one can make a software implementation similarly robust with some effort (e.g., by using a transaction log to protect the data-plus-parity double-update or by using COW mechanisms like ZFS''s in a more intelligent manner).Can you reference a software RAID implementation which implements a solution to the write hole and performs well. My understanding (and this is based on what I''ve been told from people more knowledgeable in this domain than I) is that software RAID has suffered from being unable to provide both correctness and acceptable performance.> The part of RAID-Z that''s brain-damaged is its concurrent-small-to-medium-sized-access performance (at least up to request sizes equal to the largest block size that ZFS supports, and arguably somewhat beyond that): while conventional RAID-5 can satisfy N+1 small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in parallel (though the latter also take an extra rev to complete), RAID-Z can satisfy only one small-to-medium access request at a time (well, plus a smidge for read accesses if it doesn''t verity the parity) - effectively providing RAID-3-style performance.Brain damage seems a bit of an alarmist label. While you''re certainly right that for a given block we do need to access all disks in the given stripe, it seems like a rather quaint argument: aren''t most environments that matter trying to avoid waiting for the disk at all? Intelligent prefetch and large caches -- I''d argue -- are far more important for performance these days.> The easiest way to fix ZFS''s deficiency in this area would probably be to map each group of N blocks in a file as a stripe with its own parity - which would have the added benefit of removing any need to handle parity groups at the disk level (this would, incidentally, not be a bad idea to use for mirroring as well, if my impression is correct that there''s a remnant of LVM-style internal management there). While this wouldn''t allow use of parity RAID for very small files, in most installations they really don''t occupy much space compared to that used by large files so this should not constitute a significant drawback.I don''t really think this would be feasible given how ZFS is stratified today, but go ahead and prove me wrong: here are the instructions for bringing over a copy of the source code: http://www.opensolaris.org/os/community/tools/scm - ahl -- Adam Leventhal, FishWorks http://blogs.sun.com/ahl
Adam Leventhal wrote:> On Thu, Nov 08, 2007 at 07:28:47PM -0800, can you guess? wrote: >>> How so? In my opinion, it seems like a cure for the brain damage of RAID-5. >> Nope. >> >> A decent RAID-5 hardware implementation has no ''write hole'' to worry about, and one can make a software implementation similarly robust with some effort (e.g., by using a transaction log to protect the data-plus-parity double-update or by using COW mechanisms like ZFS''s in a more intelligent manner). > > Can you reference a software RAID implementation which implements a solution > to the write hole and performs well.No, but I described how to use a transaction log to do so and later on in the post how ZFS could implement a different solution more consistent with its current behavior. In the case of the transaction log, the key is to use the log not only to protect the RAID update but to protect the associated higher-level file operation as well, such that a single log force satisfies both (otherwise, logging the RAID update separately would indeed slow things down - unless you had NVRAM to use for it, in which case you''ve effectively just reimplemented a low-end RAID controller - which is probably why no one has implemented that kind of solution in a stand-alone software RAID product). ...>> The part of RAID-Z that''s brain-damaged is its concurrent-small-to-medium-sized-access performance (at least up to request sizes equal to the largest block size that ZFS supports, and arguably somewhat beyond that): while conventional RAID-5 can satisfy N+1 small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in parallel (though the latter also take an extra rev to complete), RAID-Z can satisfy only one small-to-medium access request at a time (well, plus a smidge for read accesses if it doesn''t verity the parity) - effectively providing RAID-3-style performance. > > Brain damage seems a bit of an alarmist label.I consider ''brain damage'' to be if anything a charitable characterization. While you''re certainly right> that for a given block we do need to access all disks in the given stripe, > it seems like a rather quaint argument: aren''t most environments that matter > trying to avoid waiting for the disk at all?Everyone tries to avoid waiting for the disk at all. Remarkably few succeed very well. Intelligent prefetch and large> caches -- I''d argue -- are far more important for performance these days.Intelligent prefetch doesn''t do squat if your problem is disk throughput (which in server environments it frequently is). And all caching does (if you''re lucky and your workload benefits much at all from caching) is improve your system throughput at the point where you hit the disk throughput wall. Improving your disk utilization, by contrast, pushes back that wall. And as I just observed in another thread, not by 20% or 50% but potentially by around two decimal orders of magnitude if you compare the sequential scan performance to multiple randomly-updated database tables between a moderately coarsely-chunked conventional RAID and a fine-grained ZFS block size (e.g., the 16 KB used by the example database) with each block sprayed across several disks. Sure, that''s a worst-case scenario. But two orders of magnitude is a hell of a lot, even if it doesn''t happen often - and suggests that in more typical cases you''re still likely leaving a considerable amount of performance on the table even if that amount is a lot less than a factor of 100.> >> The easiest way to fix ZFS''s deficiency in this area would probably be to map each group of N blocks in a file as a stripe with its own parity - which would have the added benefit of removing any need to handle parity groups at the disk level (this would, incidentally, not be a bad idea to use for mirroring as well, if my impression is correct that there''s a remnant of LVM-style internal management there). While this wouldn''t allow use of parity RAID for very small files, in most installations they really don''t occupy much space compared to that used by large files so this should not constitute a significant drawback. > > I don''t really think this would be feasible given how ZFS is stratified > today, but go ahead and prove me wrong: here are the instructions for > bringing over a copy of the source code: > > http://www.opensolaris.org/os/community/tools/scmNow you want me not only to design the fix but code it for you? I''m afraid that you vastly overestimate my commitment to ZFS: while I''m somewhat interested in discussing it and happy to provide what insights I can, I really don''t personally care whether it succeeds or fails. But I sort of assumed that you might. - bill This message posted from opensolaris.org
Hello can, Thursday, November 15, 2007, 2:54:21 AM, you wrote: cyg> The major difference between ZFS and WAFL in this regard is that cyg> ZFS batch-writes-back its data to disk without first aggregating cyg> it in NVRAM (a subsidiary difference is that ZFS maintains a cyg> small-update log which WAFL''s use of NVRAM makes unnecessary). cyg> Decoupling the implementation from NVRAM makes ZFS usable on cyg> arbitrary rather than specialized platforms, and that without cyg> doubt constitutes a significant advantage by increasing the cyg> available options (in both platform and price) for those cyg> installations that require the kind of protection (and ease of cyg> management) that both WAFL and ZFS offer and that don''t require cyg> the level of performance that WAFL provides and ZFS often may not cyg> (the latter hasn''t gotten much air time here, and while it can be cyg> discussed to some degree in the abstract a better approach would cyg> be to have some impartial benchmarks to look at, because the cyg> on-disk block layouts do differ significantly and sometimes cyg> subtly even if the underlying approaches don''t). Well, ZFS allows you to put its ZIL on a separate device which could be NVRAM. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On 11/15/07 9:05 AM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:> Hello can, > > Thursday, November 15, 2007, 2:54:21 AM, you wrote: > > cyg> The major difference between ZFS and WAFL in this regard is that > cyg> ZFS batch-writes-back its data to disk without first aggregating > cyg> it in NVRAM (a subsidiary difference is that ZFS maintains a > cyg> small-update log which WAFL''s use of NVRAM makes unnecessary). > cyg> Decoupling the implementation from NVRAM makes ZFS usable on > cyg> arbitrary rather than specialized platforms, and that without > cyg> doubt constitutes a significant advantage by increasing the > cyg> available options (in both platform and price) for those > cyg> installations that require the kind of protection (and ease of > cyg> management) that both WAFL and ZFS offer and that don''t require > cyg> the level of performance that WAFL provides and ZFS often may not > cyg> (the latter hasn''t gotten much air time here, and while it can be > cyg> discussed to some degree in the abstract a better approach would > cyg> be to have some impartial benchmarks to look at, because the > cyg> on-disk block layouts do differ significantly and sometimes > cyg> subtly even if the underlying approaches don''t). > > Well, ZFS allows you to put its ZIL on a separate device which could > be NVRAM.Like RAMSAN SSD http://www.superssd.com/products/ramsan-300/ It is the only FC attached, Battery-backed SSD that I know of, and we have dreams of clusterfication. Otherwise we would use one of those PCI-Express based NVRAM cards that are on the horizon. My initial results for lots of small files was very pleasing. I dream of a JBOD with lots of disks + something like this built into 3u. Too bad Sun''s forthcoming JBODS probably wont have anything similar to this... -Andy
...> Well, ZFS allows you to put its ZIL on a separate > device which could > be NVRAM.And that''s a GOOD thing (especially because it''s optional rather than requiring that special hardware be present). But if I understand the ZIL correctly not as effective as using NVRAM as a more general kind of log for a wider range of data sizes and types, as WAFL does. - bill This message posted from opensolaris.org
> Brain damage seems a bit of an alarmist label. While you''re certainly right > that for a given block we do need to access all disks in the given stripe, > it seems like a rather quaint argument: aren''t most environments that > matter trying to avoid waiting for the disk at all? Intelligent prefetch > and large caches -- I''d argue -- are far more important for performance > these days.The concurrent small-i/o problem is fundamental though. If you have an application where you care only about random concurrent reads for example, you would not want to use raidz/raidz2 currently. No amount of smartness in the application gets around this. It *is* a relevant shortcoming of raidz/raidz2 compared to raid5/raid6, even if in many cases it is not significant. If disk space is not an issue, striping across mirrors will be okay for random seeks. But if you also care about diskspace, it''s a show stopper unless you can throw money at the problem. -- / Peter Schuller PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: This is a digitally signed message part. URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071116/705989e6/attachment.bin>
Anton B. Rang
2007-Nov-16 06:36 UTC
[zfs-discuss] Macs & compatibility (was Re: Yager on ZFS)
This is clearly off-topic :-) but perhaps worth correcting -->Long-time MAC users must be getting used to having their entire world >disrupted and having to re-buy all their software. This is at least the >second complete flag-day (no forward or backwards compatibility) change >they''ve been through.Actually, no; a fair number of Macintosh applications written in 1984, for the original Macintosh, still run on machines/OSes shipped in 2006. Apple provided processor compatibility by emulating the 68000 series on PowerPC, and the PowerPC on Intel; and OS compatibility by providing essentially a virtual machine running Mac OS 9 inside Mac OS X (up through 10.4). Sadly, Mac OS 9 applications no longer run on Mac OS 10.5, so it''s true that "the world is disrupted" now for those with software written prior to 2000 or so. To make this vaguely Solaris-relevant, it''s impressive that SunOS 4.x applications still generally run on Solaris 10, at least on SPARC systems, though Sun doesn''t do processor emulation. Still not very ZFS-relevant. :-) This message posted from opensolaris.org
can you guess? <billtodd <at> metrocast.net> writes:> > You really ought to read a post before responding to it: the CERN study > did encounter bad RAM (and my post mentioned that) - but ZFS usually can''t > do a damn thing about bad RAM, because errors tend to arise either > before ZFS ever gets the data or after it has already returned and checked > it (and in both cases, ZFS will think that everything''s just fine).According to the memtest86 author, corruption most often occurs at the moment memory cells are written to, by causing bitflips in adjacent cells. So when a disk DMA data to RAM, and corruption occur when the DMA operation writes to the memory cells, and then ZFS verifies the checksum, then it will detect the corruption. Therefore ZFS is perfectly capable (and even likely) to detect memory corruption during simple read operations from a ZFS pool. Of course there are other cases where neither ZFS nor any other checksumming filesystem is capable of detecting anything (e.g. the sequence of events: data is corrupted, checksummed, written to disk). -- Marc Bevand
Toby Thain
2007-Nov-16 11:36 UTC
[zfs-discuss] Macs & compatibility (was Re: Yager on ZFS)
On 16-Nov-07, at 4:36 AM, Anton B. Rang wrote:> This is clearly off-topic :-) but perhaps worth correcting -- > >> Long-time MAC users must be getting used to having their entire world >> disrupted and having to re-buy all their software. This is at >> least the >> second complete flag-day (no forward or backwards compatibility) >> change >> they''ve been through. > > Actually, no; a fair number of Macintosh applications written in > 1984, for the original Macintosh, still run on machines/OSes > shipped in 2006. Apple provided processor compatibility by > emulating the 68000 series on PowerPC, and the PowerPC on Intel;Absolutely Anton, original poster deserves firm correction. Very little broke in either transition; Apple had excellent success with fast and reliable emulation (68K, classic runtime on OS X, PPC on Rosetta).> and OS compatibility by providing essentially a virtual machine > running Mac OS 9 inside Mac OS X (up through 10.4). > > Sadly, Mac OS 9 applications no longer run on Mac OS 10.5, so it''s > true that "the world is disrupted" now for those with software > written prior to 2000 or so.I will miss MPW. I wish they would release sources so we could bring it native to OS X. --Toby (Mac user since 1986 or so).> > To make this vaguely Solaris-relevant, it''s impressive that SunOS > 4.x applications still generally run on Solaris 10, at least on > SPARC systems, though Sun doesn''t do processor emulation. Still not > very ZFS-relevant. :-) > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> can you guess? <billtodd <at> metrocast.net> writes: > > > > You really ought to read a post before responding > to it: the CERN study > > did encounter bad RAM (and my post mentioned that) > - but ZFS usually can''t > > do a damn thing about bad RAM, because errors tend > to arise either > > before ZFS ever gets the data or after it has > already returned and checked > > it (and in both cases, ZFS will think that > everything''s just fine). > > According to the memtest86 author, corruption most > often occurs at the moment > memory cells are written to, by causing bitflips in > adjacent cells. So when a > disk DMA data to RAM, and corruption occur when the > DMA operation writes to > the memory cells, and then ZFS verifies the checksum, > then it will detect the > corruption. > > Therefore ZFS is perfectly capable (and even likely) > to detect memory > corruption during simple read operations from a ZFS > pool. > > Of course there are other cases where neither ZFS nor > any other checksumming > filesystem is capable of detecting anything (e.g. the > sequence of events: data > is corrupted, checksummed, written to disk).Indeed - the latter was the first of the two scenarios that I sketched out. But at least on the read end of things ZFS should have a good chance of catching errors due to marginal RAM. That must mean that most of the worrisome alpha-particle problems of yore have finally been put to rest (since they''d be similarly likely to trash data on the read side after ZFS had verified it). I think I remember reading that somewhere at some point, but I''d never gotten around to reading that far in the admirably-detailed documentation that accompanies memtest: thanks for enlightening me. - bill This message posted from opensolaris.org
> Getting back to ''consumer'' use for a moment, though, > given that something like 90% of consumers entrust > their PC data to the tender mercies of Windows, and a > large percentage of those neither back up their data, > nor use RAID to guard against media failures, nor > protect it effectively from the perils of Internet > infection, it would seem difficult to assert that > whatever additional protection ZFS may provide would > make any noticeable difference in the consumer space > - and that was the kind of reasoning behind my > comment that began this sub-discussion.As a consumer at home, IT guy at work and amateur photographer, I think ZFS will help change that. Here''s what I think photogs evolve through: 1) What are negatives? - Mom/dad taking holiday photos 2) Keep negatives in the envelope - average snapshot photog 3) Keep them filed in boxes - started snapping with a SLR? Might be doing darkroom work 4) Get acid free boxes - pro/am. 5) Store slides in archival environment (humidity, temp, etc). - obsessive In the digital world: 1) keeps them on the card until printed. Only keeps the print 2) copies them to disk & erases them off the card. Gets burned when system disk dies 2a) puts them on CD/DVD. Gets burned a little when the disk dies and some photos not on CD/DVDs yet. 3a) gets an external USB drive to store things. Gets burned when that disk dies. 3b) run raid in the box. 3c) gets an external RAID disk (buffalo/ReadyNAS, etc). 4) archives to multiple places. etc... 5) gets ZFS and does transfer direct to local disk from flash card. Today I can build a Solaris file server for a reasonable price with off the shelf parts ($300 + disks). I can''t get near that for a WAFL based system. The only WAFL I can get is only on networked storage which fails 5) for the obsessed. I can see ZFS coming to ready made networked RAID box that a pro-am photographer could purchase. I don''t ever see that with WAFL. And either FS on a network RAID box will be less error prone then a box running ext3/xfs as is typical now. And that''s what the ZFS hype is about IMO. As for a the viability of buying one of the boxes, look at what a pro-am photographer might buy. I bought a Nikon D100 for $1600 when it came up. A new lens for $500 and I''m interested in $1000 lenses. Tripod, flash, etc. I spent lots of $$ to capture the images. I''ll spend similar to keep them. This message posted from opensolaris.org
On 29-Nov-07, at 2:48 PM, Tom Buskey wrote:>> Getting back to ''consumer'' use for a moment, though, >> given that something like 90% of consumers entrust >> their PC data to the tender mercies of Windows, and a >> large percentage of those neither back up their data, >> nor use RAID to guard against media failures, nor >> protect it effectively from the perils of Internet >> infection, it would seem difficult to assert that >> whatever additional protection ZFS may provide would >> make any noticeable difference in the consumer space >> - and that was the kind of reasoning behind my >> comment that began this sub-discussion. > > As a consumer at home, IT guy at work and amateur photographer, I > think ZFS will help change that. ... > 5) gets ZFS and does transfer direct to local disk from flash card. > > Today I can build a Solaris file server for a reasonable price with > off the shelf parts ($300 + disks). I can''t get near that for a > WAFL based system. The only WAFL I can get is only on networked > storage which fails 5) for the obsessed. > > I can see ZFS coming to ready made networked RAID box that a pro-am > photographer could purchase.Xserve + Xserve RAID... ZFS is already in OS X 10.5. As easy to set up and administer as any OS X system; a problem free and FAST network server to Macs or PCs. http://www.apple.com/xserve/ --Toby> I don''t ever see that with WAFL. And either FS on a network RAID > box will be less error prone then a box running ext3/xfs as is > typical now. > > And that''s what the ZFS hype is about IMO. > > As for a the viability of buying one of the boxes, look at what a > pro-am photographer might buy. ...
On 11/29/07, Toby Thain <toby at smartgames.ca> wrote:> Xserve + Xserve RAID... ZFS is already in OS X 10.5. > > As easy to set up and administer as any OS X system; a problem free > and FAST network server to Macs or PCs.That is a great theory ... we have a number of Xserves with Xraids. No ZFS on Mac OS X (yet), so we are running HFS+. The problem is that HFS+ is such a pig that some backups never functional complete (one server has about 4-5 TB and millions of files, not the large media files that HFS+ seems to be optimized for). About 18 months ago we had a scheduled server room power outage. We brought everything down cleanly. On bringing it back up the volume from the Xraid was corrupt. Apple''s only answer (after much analysis) was to reload from backup. Thankfully this was not the server with the millions of files, but one that we did have good backups of. We have been terrified every time we have had to restart the server with the millions of files. Not technology that I would want to trust my photos to, at least until there are better recovery tools out there. But then again, I have a similar issue with ZFS. So far there haven''t been enough odd failures to cause real recovery tools to be written. Eventually there will be tools to reconstruct as much of the metadata as possible after a disaster (there always are for true Enterprise systems), but not yet. -- Paul Kraus Albacon 2008 Facilities
On 29-Nov-07, at 4:09 PM, Paul Kraus wrote:> On 11/29/07, Toby Thain <toby at smartgames.ca> wrote: > >> Xserve + Xserve RAID... ZFS is already in OS X 10.5. >> >> As easy to set up and administer as any OS X system; a problem free >> and FAST network server to Macs or PCs. > > That is a great theory ... we have a number of Xserves with > Xraids. No ZFS on Mac OS X (yet),10.5.> so we are running HFS+. The problem > is that HFS+ is such a pig that some backups never functional complete > (one server has about 4-5 TB and millions of files, not the large > media files that HFS+ seems to be optimized for).You might also enjoy some of the alternative Xserve configurations described at http://alienraid.org/ I would not expect to see such scaling issues with Linux on Xserve, for example.> About 18 months ago > we had a scheduled server room power outage. We brought everything > down cleanly. On bringing it back up the volume from the Xraid was > corrupt. Apple''s only answer (after much analysis) was to reload from > backup. Thankfully this was not the server with the millions of files, > but one that we did have good backups of. We have been terrified every > time we have had to restart the server with the millions of files. > > Not technology that I would want to trust my photos to, at > least until there are better recovery tools out there. But then again, > I have a similar issue with ZFS. So far there haven''t been enough odd > failures to cause real recovery tools to be written.Recovery != backup.> Eventually there > will be tools to reconstruct as much of the metadata as possible after > a disaster (there always are for true Enterprise systems), but not > yet.Sounds like you have answered all your questions about ZFS already. --Toby> > -- > Paul Kraus > Albacon 2008 Facilities > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 11/29/07, Toby Thain <toby at smartgames.ca> wrote:> > That is a great theory ... we have a number of Xserves with > > Xraids. No ZFS on Mac OS X (yet), > > 10.5.Last I looked they were only supporting read only ZFS under 10.5. Also, based on the experiences of a number of my coworkers, 10.5 is nowhere near ready for real production use.> > Eventually there > > will be tools to reconstruct as much of the metadata as possible after > > a disaster (there always are for true Enterprise systems), but not > > yet. > > Sounds like you have answered all your questions about ZFS already.Yup. My comment was more about MacOSX, Xserve, and Xraid suitability in situations where reliability is key than ZFS. -- Paul Kraus Albacon 2008 Facilities
[Zombie thread returns from the grave...]> > Getting back to ''consumer'' use for a moment, > though, > > given that something like 90% of consumers entrust > > their PC data to the tender mercies of Windows, and > a > > large percentage of those neither back up their > data, > > nor use RAID to guard against media failures, nor > > protect it effectively from the perils of Internet > > infection, it would seem difficult to assert that > > whatever additional protection ZFS may provide > would > > make any noticeable difference in the consumer > space > > - and that was the kind of reasoning behind my > > comment that began this sub-discussion. > > As a consumer at home, IT guy at work and amateur > photographer, I think ZFS will help change that.Let''s see, now: Consumer at home? OK so far. IT guy at work? Nope, nothing like a mainstream consumer, who doesn''t want to know about anything like the level of detail under discussion here. Amateur photographer? Well, sort of - except that you seem to be claiming to have reached the *final* stage of evolution that you lay out below, which - again - tends to place you *well* out of the mainstream. Try reading my paragraph above again and seeing just how closely it applies to people like you.> Here''s what I think photogs evolve through: > ) What are negatives? - Mom/dad taking holiday > photos > 2) Keep negatives in the envelope - average snapshot > photog > 3) Keep them filed in boxes - started snapping with a > SLR? Might be doing darkroom work > 4) Get acid free boxes - pro/am. > 5) Store slides in archival environment (humidity, > temp, etc). - obsessive > > In the digital world: > 1) keeps them on the card until printed. Only keeps > the print > 2) copies them to disk & erases them off the card. > Gets burned when system disk dies > 2a) puts them on CD/DVD. Gets burned a little when the > disk dies and some photos not on CD/DVDs yet.OK so far. My wife is an amateur photographer and that''s the stage where she''s at. Her parents, however, are both retired *professional* photographers - and that''s where they''re at as well.> 3a) gets an external USB drive to store things. Gets > burned when that disk dies.That sounds as if it should have been called ''2b'' rather than ''3a'', since there''s still only one copy of the data.> 3b) run raid in the box. > 3c) gets an external RAID disk (buffalo/ReadyNAS, > etc).While these (finally) give you some redundancy, they don''t protect against loss due to user errors, system errors, or virii (well, an external NAS might help some with the last two, but not a simple external RAID). They also cost significantly more (and are considerably less accessible to the average consumer) than simply keeping a live copy on your system plus an archive copy (better yet, *two* archive copies) on DVDs (the latter is what my wife and her folks do for any photos they care about).> 4) archives to multiple places. > etc...At which point you find out that you didn''t need RAID after all (see above): you just leave the photos on a flash card (which are dirt-cheap these days) and your system disk until they''ve been copied to the archive media.> > 5) gets ZFS and does transfer direct to local disk > from flash card.Which doesn''t give you any data redundancy at all unless you''re using multiple drives (guess how many typical consumers do) and doesn''t protect you from user errors, system errors, or virii (unless you use an external NAS to help with the last two - again, guess how many typical consumers do) - and you''d *still* arguably be better off using the approach I described in my previous paragraph (since there''s nothing like off-site storage if you want *real* protection). In other words, you can''t even make the ZFS case for the final-stage semi-professional photographer above, let alone anything remotely resembling a ''consumer'': you''d just really, really like to justify something that you''ve become convinced is hot. There''s obviously been some highly-effective viral marketing at work here.> > Today I can build a Solaris file server for a > reasonable price with off the shelf parts ($300 + > disks).*Build* a file server? You must be joking: if a typical consumer wants to *buy* a file server they can do so (though I''m not sure that a large percentage of ''typical'' consumers actually *have* done so) - but expecting them to go out and shop for one running ZFS is - well, ''hopelessly naive'' doesn''t begin to do the idea justice. I can''t get near that for a WAFL based> system.Please don''t try to reintroduce WAFL into the consumer part of this discussion: I though we''d finally succeeded in separating the sub-threads. ...> I can see ZFS coming to ready made networked RAID box > that a pro-am photographer could purchase.*If* s/he had any interest in ZFS per se - see above. I don''t> ever see that with WAFL. And either FS on a network > RAID box will be less error prone then a box running > ext3/xfs as is typical now.''Less error prone'', while a nice marketing message, doesn''t actually mean squat: the question is whether it''s *sufficiently* less error-prone to be significant to the average user, and AFAICT the answer is a definite "No". - bill This message posted from opensolaris.org
I never said I was a typical consumer. After all, I bought a $1600 DSLR. If you look around photo forums, you''ll see an interest the digital workflow which includes long term storage and archiving. A chunk of these users will opt for an external RAID box (10%? 20%?). I suspect ZFS will change that game in the future. In particular for someone doing lots of editing, snapshots can help recover from user error. This message posted from opensolaris.org
Your response here appears to refer to a different post in this thread.> I never said I was a typical consumer.Then it''s unclear how your comment related to the material which you quoted (and hence to which it was apparently responding).> If you look around photo forums, you''ll see an > interest the digital workflow which includes long > term storage and archiving. A chunk of these users > will opt for an external RAID box (10%? 20%?). I > suspect ZFS will change that game in the future. In > particular for someone doing lots of editing, > snapshots can help recover from user error.Ah - so now the rationalization has changed to snapshot support. Unfortunately for ZFS, snapshot support is pretty commonly available (e.g., in Linux''s LVM - and IIRC BSD''s as well - if you''re looking at open-source solutions) so anyone who actually found this feature important has had access to it for quite a while already. And my original comment which you quoted still obtains as far as typical consumers are concerned. - bill This message posted from opensolaris.org
> > On 11/7/07, can you guess? > billtodd at metrocast.net > > wrote: > However, ZFS is not the *only* open-source approach > which may allow that to happen, so the real question > becomes just how it compares with equally inexpensive > current and potential alternatives (and that would > make for an interesting discussion that I''m not sure > I have time to initiate tonight). > > - billHi bill, only a question: I''m an ex linux user migrated to solaris for zfs and its checksumming; you say there are other open-source alternatives but, for a linux end user, I''m aware only of Oracle btrfs (http://oss.oracle.com/projects/btrfs/), who is a Checksumming Copy on Write Filesystem not in a final state. what *real* alternatives are you referring to??? if I missed something tell me, and I''ll happily stay with linux with my data checksummed and snapshotted. bye --- Stefano Spinucci This message posted from opensolaris.org
> > > On 11/7/07, can you guess? > > billtodd at metrocast.net > > > wrote: > > However, ZFS is not the *only* open-source > approach > > which may allow that to happen, so the real > question > > becomes just how it compares with equally > inexpensive > > current and potential alternatives (and that would > > make for an interesting discussion that I''m not > sure > > I have time to initiate tonight). > > > > - bill > > Hi bill, only a question: > I''m an ex linux user migrated to solaris for zfs and > its checksumming;So the question is: do you really need that feature (please quantify that need if you think you do), or do you just like it because it makes you feel all warm and safe? Warm and safe is definitely a nice feeling, of course, but out in the real world of corporate purchasing it''s just one feature out of many ''nice to haves'' - and not necessarily the most important. In particular, if the *actual* risk reduction turns out to be relatively minor, that nice ''feeling'' doesn''t carry all that much weight. you say there are other open-source> alternatives but, for a linux end user, I''m aware > only of Oracle btrfs > (http://oss.oracle.com/projects/btrfs/), who is a > Checksumming Copy on Write Filesystem not in a final > state. > > what *real* alternatives are you referring to???As I said in the post to which you responded, I consider ZFS''s ease of management to be more important (given that even in high-end installations storage management costs dwarf storage equipment costs) than its real but relatively marginal reliability edge, and that''s the context in which I made my comment about alternatives (though even there if ZFS continues to require definition of mirror pairs and parity groups for redundancy that reduces its ease-of-management edge, as does its limitation to a single host system in terms of ease-of-scaling). Specifically, features like snapshots, disk scrubbing (to improve reliability by dramatically reducing the likelihood of encountering an unreadable sector during a RAID rebuild), and software RAID (to reduce hardware costs) have been available for some time in Linux and FreeBSD, and canned management aids would not be difficult to develop if they don''t exist already. The dreaded ''write hole'' in software RAID is a relatively minor exposure (since it only compromises data if a system crash or UPS failure - both rare events in an enterprise setting - sneaks in between a data write and the corresponding parity update and then, before the array has restored parity consistency in the background, a disk dies) - and that exposure can be reduced to seconds by a minuscule amount of NVRAM that remembers which writes were active (or to zero with somewhat more NVRAM to remember the updates themselves in an inexpensive hardware solution). The real question is usually what level of risk an enterprise storage user is willing to tolerate. At the paranoid end of the scale reside the users who will accept nothing less than z-series or Tandem-/Stratus-style end-to-end hardware checking from the processor traces on out - which rules out most environments that ZFS runs in (unless Sun''s N-series telco products might fill the bill: I''m not very familiar with them). And once you get down into users of commodity processors, the risk level of using stable and robust file systems that lack ZFS''s additional integrity checks is comparable to the risk inherent in the rest of the system (at least if the systems are carefully constructed, which should be a given in an enterprise setting) - so other open-source solutions are definitely in play there. All things being equal, of course users would opt for even marginally higher reliability - but all things are never equal. If using ZFS would require changing platforms or changing code, that''s almost certainly a show-stopper for enterprise users. If using ZFS would compromise performance or require changes in management practices (e.g., to accommodate file-system-level quotas), those are at least significant impediments. In other words, ZFS has its pluses and minuses just as other open-source file systems do, and they *all* have the potential to start edging out expensive proprietary solutions in *some* applications (and in fact have already started to do so). When we move from ''current'' to ''potential'' alternatives, the scope for competition widens. Because it''s certainly possible to create a file system that has all of ZFS''s added reliability but runs faster, scales better, incorporates additional useful features, and is easier to manage. That discussion is the one that would take a lot of time to delve into adequately (and might be considered off topic for this forum - which is why I''ve tried to concentrate here on improvements that ZFS could actually incorporate without turning it upside down). - bill This message posted from opensolaris.org
On 4-Dec-07, at 9:35 AM, can you guess? wrote:> Your response here appears to refer to a different post in this > thread. > >> I never said I was a typical consumer. > > Then it''s unclear how your comment related to the material which > you quoted (and hence to which it was apparently responding). > >> If you look around photo forums, you''ll see an >> interest the digital workflow which includes long >> term storage and archiving. A chunk of these users >> will opt for an external RAID box (10%? 20%?). I >> suspect ZFS will change that game in the future. In >> particular for someone doing lots of editing, >> snapshots can help recover from user error. > > Ah - so now the rationalization has changed to snapshot support. > Unfortunately for ZFS, snapshot support is pretty commonly availableWe can cherry pick features all day. People choose ZFS for the combination (as well as its unique features). --Toby> (e.g., in Linux''s LVM - and IIRC BSD''s as well - if you''re looking > at open-source solutions) so anyone who actually found this feature > important has had access to it for quite a while already. > > And my original comment which you quoted still obtains as far as > typical consumers are concerned. > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 5-Dec-07, at 4:19 AM, can you guess? wrote:>>>> On 11/7/07, can you guess? >>> billtodd at metrocast.net >>>> wrote: >>> However, ZFS is not the *only* open-source >> approach >>> which may allow that to happen, so the real >> question >>> becomes just how it compares with equally >> inexpensive >>> current and potential alternatives (and that would >>> make for an interesting discussion that I''m not >> sure >>> I have time to initiate tonight). >>> >>> - bill >> >> Hi bill, only a question: >> I''m an ex linux user migrated to solaris for zfs and >> its checksumming; > > So the question is: do you really need that feature (please > quantify that need if you think you do), or do you just like it > because it makes you feel all warm and safe? > > Warm and safe is definitely a nice feeling, of course, but out in > the real world of corporate purchasing it''s just one feature out of > many ''nice to haves'' - and not necessarily the most important. In > particular, if the *actual* risk reduction turns out to be > relatively minor, that nice ''feeling'' doesn''t carry all that much > weight.On the other hand, it''s hard to argue for risk *increase* (using something else)... --Toby> > you say there are other open-source >> alternatives but, for a linux end user, I''m aware >> only of Oracle btrfs >> (http://oss.oracle.com/projects/btrfs/), who is a >> Checksumming Copy on Write Filesystem not in a final >> state. >> >> what *real* alternatives are you referring to??? > > As I said in the post to which you responded, I consider ZFS''s ease > of management to be more important (given that even in high-end > installations storage management costs dwarf storage equipment > costs) than its real but relatively marginal reliability edge, and > that''s the context in which I made my comment about alternatives > (though even there if ZFS continues to require definition of mirror > pairs and parity groups for redundancy that reduces its ease-of- > management edge, as does its limitation to a single host system in > terms of ease-of-scaling). > > Specifically, features like snapshots, disk scrubbing (to improve > reliability by dramatically reducing the likelihood of encountering > an unreadable sector during a RAID rebuild), and software RAID (to > reduce hardware costs) have been available for some time in Linux > and FreeBSD, and canned management aids would not be difficult to > develop if they don''t exist already. The dreaded ''write hole'' in > software RAID is a relatively minor exposure (since it only > compromises data if a system crash or UPS failure - both rare > events in an enterprise setting - sneaks in between a data write > and the corresponding parity update and then, before the array has > restored parity consistency in the background, a disk dies) - and > that exposure can be reduced to seconds by a minuscule amount of > NVRAM that remembers which writes were active (or to zero with > somewhat more NVRAM to remember the updates themselves in an > inexpensive hardware solution). > > The real question is usually what level of risk an enterprise > storage user is willing to tolerate. At the paranoid end of the > scale reside the users who will accept nothing less than z-series > or Tandem-/Stratus-style end-to-end hardware checking from the > processor traces on out - which rules out most environments that > ZFS runs in (unless Sun''s N-series telco products might fill the > bill: I''m not very familiar with them). And once you get down > into users of commodity processors, the risk level of using stable > and robust file systems that lack ZFS''s additional integrity checks > is comparable to the risk inherent in the rest of the system (at > least if the systems are carefully constructed, which should be a > given in an enterprise setting) - so other open-source solutions > are definitely in play there. > > All things being equal, of course users would opt for even > marginally higher reliability - but all things are never equal. If > using ZFS would require changing platforms or changing code, that''s > almost certainly a show-stopper for enterprise users. If using ZFS > would compromise performance or require changes in management > practices (e.g., to accommodate file-system-level quotas), those > are at least significant impediments. In other words, ZFS has its > pluses and minuses just as other open-source file systems do, and > they *all* have the potential to start edging out expensive > proprietary solutions in *some* applications (and in fact have > already started to do so). > > When we move from ''current'' to ''potential'' alternatives, the scope > for competition widens. Because it''s certainly possible to create > a file system that has all of ZFS''s added reliability but runs > faster, scales better, incorporates additional useful features, and > is easier to manage. That discussion is the one that would take a > lot of time to delve into adequately (and might be considered off > topic for this forum - which is why I''ve tried to concentrate here > on improvements that ZFS could actually incorporate without turning > it upside down). > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi All, I am after some ZFS success stories in the ZFS community. The stories of replacing Veritas VM/FS with ZFS in bigger data volume environments. If there is please let me know the size and type of storage used, with applications ie: DB etc. Likewise I will take a hit on horror stories if at all. (Hope not!:-)) Appreciate your help. rgds Roshan
...> >> Hi bill, only a question: > >> I''m an ex linux user migrated to solaris for zfs > and > >> its checksumming; > > > > So the question is: do you really need that > feature (please > > quantify that need if you think you do), or do you > just like it > > because it makes you feel all warm and safe? > > > > Warm and safe is definitely a nice feeling, of > course, but out in > > the real world of corporate purchasing it''s just > one feature out of > > many ''nice to haves'' - and not necessarily the most > important. In > > particular, if the *actual* risk reduction turns > out to be > > relatively minor, that nice ''feeling'' doesn''t carry > all that much > > weight. > > On the other hand, it''s hard to argue for risk > *increase* (using > something else)...And no one that I''m aware of was doing anything like that: what part of the "All things being equal" paragraph (I''ve left it in below in case you missed it the first time around) did you find difficult to understand? - bill ...> > All things being equal, of course users would opt > for even > > marginally higher reliability - but all things are > never equal. If > > using ZFS would require changing platforms or > changing code, that''s > > almost certainly a show-stopper for enterprise > users. If using ZFS > > would compromise performance or require changes in > management > > practices (e.g., to accommodate file-system-level > quotas), those > > are at least significant impediments. In other > words, ZFS has its > > pluses and minuses just as other open-source file > systems do, and > > they *all* have the potential to start edging out > expensive > > proprietary solutions in *some* applications (and > in fact have > > already started to do so).This message posted from opensolaris.org
I> >> suspect ZFS will change that game in the future. > In > > particular for someone doing lots of editing, > >> snapshots can help recover from user error. > > > > Ah - so now the rationalization has changed to > snapshot support. > > Unfortunately for ZFS, snapshot support is pretty > commonly available > > We can cherry pick features all day. People choose > ZFS for the > combination (as well as its unique features).Actually, based on the self-selected and decidedly unscientific sample of ZFS proponents that I''ve encountered around the Web lately, it appears that people choose ZFS in large part because a) they''ve swallowed the "Last Word In File Systems" viral marketing mantra hook, line, and sinker (that''s in itself not all that surprising, because the really nitty-gritty details of file system implementation aren''t exactly prime topics of household conversation - even among the technically inclined), b) they''ve incorporated this mantra into their own self-image (the ''fanboy'' phenomenon - but at least in the case of existing Sun customers this is also not very surprising, because dependency on a vendor always tends to engender loyalty - especially if that vendor is not doing all that well and its remaining customers have become increasingly desperate for good news that will reassure them). and/or c) they''re open-source zealots who''ve been sucked in by Jonathan''s recent attempt to turn the patent dispute with NetApp into something more profound than the mundane inter-corporation spat which it so clearly is. All of which certainly helps explain why so many of those proponents are so resistant to rational argument: their zeal is not technically based, just technically rationalized (as I was pointing out in the post to which you responded) - much more like the approach of a (volunteer) marketeer with an agenda than like that of an objective analyst (not to suggest that *no one* uses ZFS based on an objective appreciation of the trade-offs involved in doing so, of course - just that a lot of its more vociferous supporters apparently don''t). - bill This message posted from opensolaris.org
> > > > On 11/7/07, can you guess? > > > billtodd at metrocast.net > > > > wrote: > As I said in the post to which you responded, I > consider ZFS''s ease of management to be more > important (given that even in high-end installations > storage management costs dwarf storage equipment > costs) than its real but relatively marginal > reliability edge, and that''s the context in which I > made my comment about alternatives (though even there > if ZFS continues to require definition of mirror > pairs and parity groups for redundancy that reduces > its ease-of-management edge, as does its limitation > to a single host system in terms of > ease-of-scaling). > > Specifically, features like snapshots, disk scrubbing > (to improve reliability by dramatically reducing the > likelihood of encountering an unreadable sector > during a RAID rebuild), and software RAID (to reduce > hardware costs) have been available for some time in > Linux and FreeBSD, and canned management aids would > not be difficult to develop if they don''t exist > already. The dreaded ''write hole'' in software RAID > is a relatively minor exposure (since it only > compromises data if a system crash or UPS failure - > both rare events in an enterprise setting - sneaks in > between a data write and the corresponding parity > update and then, before the array has restored parity > consistency in the background, a disk dies) - and > that exposure can be reduced to seconds by a > minuscule amount of NVRAM that remembers which writes > were active (or to zero with somewhat more NVRAM to > remember the updates themselves in an inexpensive > hardware solution). > > The real question is usually what level of risk an > enterprise storage user is willing to tolerate. At > the paranoid end of the scale reside the users who > will accept nothing less than z-series or > Tandem-/Stratus-style end-to-end hardware checking > from the processor traces on out - which rules out > most environments that ZFS runs in (unless Sun''s > N-series telco products might fill the bill: I''m not > very familiar with them). And once you get down into > users of commodity processors, the risk level of > using stable and robust file systems that lack ZFS''s > additional integrity checks is comparable to the risk > inherent in the rest of the system (at least if the > systems are carefully constructed, which should be a > given in an enterprise setting) - so other > open-source solutions are definitely in play there. > > All things being equal, of course users would opt for > even marginally higher reliability - but all things > are never equal. If using ZFS would require changing > platforms or changing code, that''s almost certainly a > show-stopper for enterprise users. If using ZFS > would compromise performance or require changes in > management practices (e.g., to accommodate > file-system-level quotas), those are at least > significant impediments. In other words, ZFS has its > pluses and minuses just as other open-source file > systems do, and they *all* have the potential to > start edging out expensive proprietary solutions in > *some* applications (and in fact have already started > to do so). > > When we move from ''current'' to ''potential'' > alternatives, the scope for competition widens. > Because it''s certainly possible to create a file > system that has all of ZFS''s added reliability but > runs faster, scales better, incorporates additional > useful features, and is easier to manage. That > discussion is the one that would take a lot of time > to delve into adequately (and might be considered > off topic for this forum - which is why I''ve tried > to concentrate here on improvements that ZFS could > actually incorporate without turning it upside > down). > > - billmy personal-professional data are important (this is my valuation, and it''s an assumption you can''t dispute). my data are only digital and rapidly changing, and than I cannot print them. I have budget constraints then I can use only user-level storage. until I discovered zfs I used subversion and git, but none of them is designed to manage gigabytes of data, some to be versioned, some to be unversioned. I can''t afford silent data corruption and, if the final response is "*now* there is no *real* opensource software alternative to zfs automatic checksumming and simple snapshotting" I''ll be an happy solaris user (for data storage), an happy linux user (for everyday work), and an unhappy offline windows user (for some video-related activity I can''t do with linux). PS I think for every fully digital people own data are vital, and almost everyone would reply "NONE" at your question "what level of risk user is willing to tolerate". bye --- Stefano Spinucci This message posted from opensolaris.org
Why are we still feeding this troll? Paid trolls deserve no response and there is no value in continuing this thread. (And no guys, he isn''t being paid by NetApp.. think bigger) The troll will continue to try to downplay features of zfs and the community will counter...and on and on. This message posted from opensolaris.org
> my personal-professional data are important (this is > my valuation, and it''s an assumption you can''t > dispute).Nor was I attempting to: I was trying to get you to evaluate ZFS''s incremental risk reduction *quantitatively* (and if you actually did so you''d likely be surprised at how little difference it makes - at least if you''re at all rational about assessing it). ...> I think for every fully digital people own data are > vital, and almost everyone would reply "NONE" at your > question "what level of risk user is willing to > tolerate".The fact that appears to escape people like you it that there is *always* some risk, and you *have* to tolerate it (or not save anything at all). Therefore the issue changes to just how *much* risk you''re willing to tolerate for a given amount of effort. (There''s also always the possibility of silent data corruption, even if you use ZFS - because it only eliminates *some* of the causes of such corruption. If your data is corrupted in RAM during the period when ZFS is not watching over it, for example, you''re SOL.) How to *really* protect valuable data has already been thoroughly discussed in this thread, though you don''t appear to have understood it. It takes multiple copies (most of them off-line), in multiple locations, with verification of every copy operation and occasional re-verification of the stored content - and ZFS helps with only part of one of these strategies (reverifying the integrity of your on-line copy). If you don''t take the rest of the steps, ZFS''s incremental protection is virtually useless, because the risk of data loss from causes that ZFS doesn''t protect against is so much higher than the incremental protection that it provides (i.e., you may *feel* noticeably better protected but you''re just kidding yourself). If you *do* take the rest of the steps, then it takes little additional effort to revalidate your on-line content as well as the off-line copies, so all ZFS provides is a small reduction in effort to achieve the same (very respectable) level of protection that other solutions can achieve when manual steps are taken to reverify the on-line copy as well as the off-line copies. Try to step out of your "my data is valuable" rut and wrap your mind around the fact that ZFS''s marginal contribution to its protection, real though it may be, just isn''t very significant in most environments compared to the rest of the protection solution that it *doesn''t* help with. That''s why I encouraged you to *quantify* the effect that ZFS''s protection features have in *your* environment (along with its other risks that ZFS can''t ameliorate): until you do that, you''re just another fanboy (not that there''s anything wrong with that, as long as you don''t try to present your personal beliefs as something of more objective validity). - bill This message posted from opensolaris.org
he isn''t being> paid by NetApp.. think biggerO frabjous day! Yet *another* self-professed psychic, but one whose internal voices offer different counsel. While I don''t have to be psychic myself to know that they''re *all* wrong (that''s an advantage of fact-based rather than faith-based opinions), a battle-of-the-incompetents would be amusing to watch (unless it took place in a realm which no mere mortals could visit). - bill This message posted from opensolaris.org
<trolling> can you guess? wrote:> he isn''t being > >> paid by NetApp.. think bigger >> > > O frabjous day! Yet *another* self-professed psychic, but one whose internal voices offer different counsel. > > While I don''t have to be psychic myself to know that they''re *all* wrong (that''s an advantage of fact-based rather than faith-based opinions), a battle-of-the-incompetents would be amusing to watch (unless it took place in a realm which no mere mortals could visit). > > - bill</trolling>
On Dec 5, 2007, at 17:50, can you guess? wrote:>> my personal-professional data are important (this is >> my valuation, and it''s an assumption you can''t >> dispute). > > Nor was I attempting to: I was trying to get you to evaluate ZFS''s > incremental risk reduction *quantitatively* (and if you actually > did so you''d likely be surprised at how little difference it makes > - at least if you''re at all rational about assessing it).ok .. i''ll bite since there''s no ignore feature on the list yet: what are you terming as "ZFS'' incremental risk reduction"? .. (seems like a leading statement toward a particular assumption) .. are you just trying to say that without multiple copies of data in multiple physical locations you''re not really accomplishing a more complete risk reduction yes i have read this thread, as well as many of your other posts around usenet and such .. in general i find your tone to be somewhat demeaning (slightly rude too - but - eh, who''s counting? i''m none to judge) - now, you do know that we are currently in an era of collaboration instead of deconstruction right? .. so i''d love to see the improvements on the many shortcomings you''re pointing to and passionate about written up, proposed, and freely implemented :) --- .je
That would require coming up with something solid. Much like his generalization that there''s already snapshotting and checksumming that exists for linux. yet when he was called out, he responded with a 20 page rant because there doesn''t exist such a solution. It''s far easier to condescend when called out on your BS than to actually answer the question. If there were such a solution available, it would''ve been a one line response. IE: sure, xfs has checksumming and snapshotting today in linux!!111 But alas, nothing does exist, which is exactly why there''s so much interest in zfs. "but most consumers won''t need what it provides" is a cop-out, as he knows. Just like *most consumers* don''t need more than 128kbit/sec of bandwidth, and *most consumers* didn''t need bigger than a 10MB hard drive. It turns out people tend to use the technology AFTER it''s developed. OF COURSE the need is a niche right now, just like every other technology before it. It HAS to be by the very nature that people can''t use what they don''t have. 10 years ago I couldn''t download an entire CD without waiting a couple days, and shockingly enough, there was no *consumer need* to do so. Go figure, 10 years later, the bandwidth is there, and there''s a million other technologies built up around it. But I digress, he''s already assured us all he loves ZFS and isn''t just trolling these forums. Clearly that statement trumps any and all actions that proceeded it. This message posted from opensolaris.org
On Tue, 4 Dec 2007, Stefano Spinucci wrote:>>> On 11/7/07, can you guess? >> billtodd at metrocast.net >>> wrote: >> However, ZFS is not the *only* open-source approach >> which may allow that to happen, so the real question >> becomes just how it compares with equally inexpensive >> current and potential alternatives (and that would >> make for an interesting discussion that I''m not sure >> I have time to initiate tonight). >> >> - bill > > Hi bill, only a question: > I''m an ex linux user migrated to solaris for zfs and its checksumming; you say there are other open-source alternatives but, for a linux end user, I''m aware only of Oracle btrfs (http://oss.oracle.com/projects/btrfs/), who is a Checksumming Copy on Write Filesystem not in a final state. > > what *real* alternatives are you referring to??? > > if I missed something tell me, and I''ll happily stay with linux with my data checksummed and snapshotted. > > bye > > --- > Stefano Spinucci >Hi Stefano, Did you get a *real* answer to your question? Do you think that this (quoted) message is a *real* answer? -------- can you guess? ----------- Message-ID: <151278.1196835818286.JavaMail.Twebapp at oss-app1> Date: Tue, 04 Dec 2007 22:19:54 PST From: can you guess? <billtodd at metrocast.net> To: zfs-discuss at opensolaris.org In-Reply-To: <15031034.1196824022653.JavaMail.Twebapp at oss-app1> Subject: Re: [zfs-discuss] Yager on ZFS List-Id: <zfs-discuss.opensolaris.org>> > > On 11/7/07, can you guess? > > billtodd at metrocast.net > > > wrote: > > However, ZFS is not the *only* open-source > approach > > which may allow that to happen, so the real > question > > becomes just how it compares with equally > inexpensive > > current and potential alternatives (and that would > > make for an interesting discussion that I''m not > sure > > I have time to initiate tonight). > > > > - bill > > Hi bill, only a question: > I''m an ex linux user migrated to solaris for zfs and > its checksumming;So the question is: do you really need that feature (please quantify that need if you think you do), or do you just like it because it makes you feel all warm and safe? Warm and safe is definitely a nice feeling, of course, but out in the real world of corporate purchasing it''s just one feature out of many ''nice to haves'' - and not necessarily the most important. In particular, if the *actual* risk reduction turns out to be relatively minor, that nice ''feeling'' doesn''t carry all that much weight. you say there are other open-source> alternatives but, for a linux end user, I''m aware > only of Oracle btrfs > (http://oss.oracle.com/projects/btrfs/), who is a > Checksumming Copy on Write Filesystem not in a final > state. > > what *real* alternatives are you referring to???As I said in the post to which you responded, I consider ZFS''s ease of management to be more important (given that even in high-end installations storage management costs dwarf storage equipment costs) than its real but relatively marginal reliability edge, and that''s the context in which I made my comment about alternatives (though even there if ZFS continues to require definition of mirror pairs and parity groups for redundancy that reduces its ease-of-management edge, as does its limitation to a single host system in terms of ease-of-scaling). Specifically, features like snapshots, disk scrubbing (to improve reliability by dramatically reducing the likelihood of encountering an unreadable sector during a RAID rebuild), and software RAID (to reduce hardware costs) have been available for some time in Linux and FreeBSD, and canned management aids would not be difficult to develop if they don''t exist already. The dreaded ''write hole'' in software RAID is a relatively minor exposure (since it only compromises data if a system crash or UPS failure - both rare events in an enterprise setting - sneaks in between a data write and the corresponding parity update and then, before the array has restored parity consistency in the background, a disk dies) - and that exposure can be reduced to seconds by a minuscule amount of NVRAM that remembers which writes were active (or to zero with somewhat more NVRAM to remember the updates themselves in an inexpensive hardware solution). The real question is usually what level of risk an enterprise storage user is willing to tolerate. At the paranoid end of the scale reside the users who will accept nothing less than z-series or Tandem-/Stratus-style end-to-end hardware checking from the processor traces on out - which rules out most environments that ZFS runs in (unless Sun''s N-series telco products might fill the bill: I''m not very familiar with them). And once you get down into users of commodity processors, the risk level of using stable and robust file systems that lack ZFS''s additional integrity checks is comparable to the risk inherent in the rest of the system (at least if the systems are carefully constructed, which should be a given in an enterprise setting) - so other open-source solutions are definitely in play there. All things being equal, of course users would opt for even marginally higher reliability - but all things are never equal. If using ZFS would require changing platforms or changing code, that''s almost certainly a show-stopper for enterprise users. If using ZFS would compromise performance or require changes in management practices (e.g., to accommodate file-system-level quotas), those are at least significant impediments. In other words, ZFS has its pluses and minuses just as other open-source file systems do, and they *all* have the potential to start edging out expensive proprietary solutions in *some* applications (and in fact have already started to do so). When we move from ''current'' to ''potential'' alternatives, the scope for competition widens. Because it''s certainly possible to create a file system that has all of ZFS''s added reliability but runs faster, scales better, incorporates additional useful features, and is easier to manage. That discussion is the one that would take a lot of time to delve into adequately (and might be considered off topic for this forum - which is why I''ve tried to concentrate here on improvements that ZFS could actually incorporate without turning it upside down). - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------- end of can you guess? ----------- Beep; Beep; Beep, Beep, Beep, beep beep beep beep-beep-beep Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
> I was trying to get you > to evaluate ZFS''s > > incremental risk reduction *quantitatively* (and if > you actually > > did so you''d likely be surprised at how little > difference it makes > > - at least if you''re at all rational about > assessing it). > > ok .. i''ll bite since there''s no ignore feature on > the list yet: > > what are you terming as "ZFS'' incremental risk > reduction"? .. (seems > like a leading statement toward a particular > assumption)Primarily its checksumming features, since other open source solutions support simple disk scrubbing (which given its ability to catch most deteriorating disk sectors before they become unreadable probably has a greater effect on reliability than checksums in any environment where the hardware hasn''t been slapped together so sloppily that connections are flaky). Aside from the problems that scrubbing handles (and you need scrubbing even if you have checksums, because scrubbing is what helps you *avoid* data loss rather than just discover it after it''s too late to do anything about it), and aside from problems deriving from sloppy assembly (which tend to become obvious fairly quickly, though it''s certainly possible for some to be more subtle), checksums primarily catch things like bugs in storage firmware and otherwise undetected disk read errors (which occur orders of magnitude less frequently than uncorrectable read errors). Robert Milkowski cited some sobering evidence that mid-range arrays may have non-negligible firmware problems that ZFS could often catch, but a) those are hardly ''consumer'' products (to address that sub-thread, which I think is what applies in Stefano''s case) and b) ZFS''s claimed attraction for higher-end (corporate) use is its ability to *eliminate* the need for such products (hence its ability to catch their bugs would not apply - though I can understand why people who needed to use them anyway might like to have ZFS''s integrity checks along for the ride, especially when using less-than-fully-mature firmware). And otherwise undetected disk errors occur with negligible frequency compared with software errors that can silently trash your data in ZFS cache or in application buffers (especially in PC environments: enterprise software at least tends to be more stable and more carefully controlled - not to mention their typical use of ECC RAM). So depending upon ZFS''s checksums to protect your data in most PC environments is sort of like leaving on a vacation and locking and bolting the back door of your house while leaving the front door wide open: yes, a burglar is less likely to enter by the back door, but thinking that the extra bolt there made you much safer is likely foolish. .. are you> just trying to say that without multiple copies of > data in multiple > physical locations you''re not really accomplishing a > more complete > risk reductionWhat I''m saying is that if you *really* care about your data, then you need to be willing to make the effort to lock and bolt the front door as well as the back door and install an alarm system: if you do that, *then* ZFS''s additional protection mechanisms may start to become significant (because you''re eliminated the higher-probability risks and ZFS''s extra protection then actually reduces the *remaining* risk by a significant percentage). Conversely, if you don''t care enough about your data to take those extra steps, then adding ZFS''s incremental protection won''t reduce your net risk by a significant percentage (because the other risks that still remain are so much larger). Was my point really that unclear before? It seems as if this must be at least the third or fourth time that I''ve explained it.> > yes i have read this thread, as well as many of your > other posts > around usenet and such .. in general i find your tone > to be somewhat > demeaning (slightly rude too - but - eh, who''s > counting? i''m none to > judge)As I''ve said multiple times before, I respond to people in the manner they seem to deserve. This thread has gone on long enough that there''s little excuse for continued obtuseness at this point, but I still attempt to be pleasant as long as I''m not responding to something verging on being hostile. - now, you do know that we are currently in an> era of > collaboration instead of deconstruction right?Can''t tell it from the political climate, and corporations seem to be following that lead (I guess they''ve finally stopped just gazing in slack-jawed disbelief at what this administration is getting away with and decided to cash in on the opportunity themselves). Or were you referring to something else? .. so> i''d love to see > the improvements on the many shortcomings you''re > pointing to and > passionate about written up, proposed, and freely > implemented :)Then ask the ZFS developers to get on the stick: fixing the fragmentation problem discussed elsewhere should be easy, and RAID-Z is at least amenable to a redesign (though not without changing the on-disk metadata structures a bit - but while they''re at it, they could include support for data redundancy in a manner analogous to ditto blocks so that they could get rid of the vestigial LVM-style management in that area). Changing ZFS''s approach to snapshots from block-oriented to audit-trail-oriented, in order to pave the way for a journaled rather than shadow-paged approach to transactional consistency (which then makes data redistribution easier to allow rebalancing across not only local disks but across multiple nodes using algorithmic rather than pointer-based placement) starts to get more into a ''raze it to the ground and start over'' mode, though - leaving plenty of room for one or more extended postscripts to ''the last word in file systems''. - bill This message posted from opensolaris.org
On Wed, 5 Dec 2007, Eric Haycraft wrote: [... reformatted .... ]> Why are we still feeding this troll? Paid trolls deserve no response > and there is no value in continuing this thread. (And no guys, he > isn''t being paid by NetApp.. think bigger) The troll will continue > to try to downplay features of zfs and the community will > counter...and on and on.+1 - a troll Ques: does it matter why he''s a troll? I don''t think so.... but my best guess is that Bill is out of work, and, due to the financial hardship, has had to cut his alzheimer''s medication dosage in half. I could be wrong, with my guess, but as long as I keep seeing this "can you guess?" question, I feel compelled to answer it. :) Please feel free to offer your best "can you guess?" answer! Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
> I have budget constraints then I can use only user-level storage. > > until I discovered zfs I used subversion and git, but none of them is designe > d to manage gigabytes of data, some to be versioned, some to be unversioned. > > I can''t afford silent data corruption and, if the final response is "*now* th > ere is no *real* opensource software alternative to zfs automatic checksummin > g and simple snapshotting" I''ll be an happy solaris user (for data storage), > an happy linux user (for everyday work), and an unhappy offline windows user > (for some video-related activity I can''t do with linux).Note that I don''t wish to argue for/against zfs/billtodd but the comment above about "no *real* opensource software alternative zfs automating checksumming and simple snapshotting" caught my eye. There is an open source alternative for archiving that works quite well. venti has been available for a few years now. It runs on *BSD, linux, macOS & plan9 (its native os). It uses strong crypto checksums, stored separately from the data (stored in the pointer blocks) so you get a similar guarantee against silent data corruption as ZFS. You can back up a variety of filesystems (ufs, hfs, ext2fs, fat) or use it to to backup a file tree. Each backup results in a single 45 byte "score" containing the checksum of root pointer block. Using this score you can retrieve the entire backup. Further, it stores only one copy of a data block regardless of what files or which backup it may belong to. In effect every "full backup" is an incremental backup (only changed data blocks and changed or new ptr blocks are stored). So it is really an "archival" server. You don''t take snapshots but you do a backup. However you can nfs mount a venti and all your backups will show up under directories like <machine>/<yyyy>/<mm><dd>/<filesystem>. Ideally you''d store a venti on RAID storage. You can even copy a bunch of venti to another one, you can store its arenas on CDs or DVD and so on. It is not as fast as ZFS nor anywhere near as easy to use and its intended use is not the same as ZFS (not a primary filesystem). But for what it does, it is not bad at all! Unlike ZFS, it fits best where you have a fast filesystem for speed critical use, venti for backups and RAID for redundancy. Google for "venti sean dorward". If interested, go to http://swtch.com/plan9port/ and pick up plan9port (a collection of programs from plan9, not just venti). See http://swtch.com/plan9port/man/man8/index.html for how to use venti.> I think for every fully digital people own data are vital, and almost everyon > e would reply "NONE" at your question "what level of risk user is willing to > tolerate".NONE is not possible. It is a question of how much risk you are willing to tolerate for what cost. Thankfully, these days you have a variety of choices and much much lower cost for a given degree of risk compared to just a few years ago!
can you guess? wrote:> > Primarily its checksumming features, since other open source solutions support simple disk scrubbing (which given its ability to catch most deteriorating disk sectors before they become unreadable probably has a greater effect on reliability than checksums in any environment where the hardware hasn''t been slapped together so sloppily that connections are flaky). >From what I''ve read on the subject, That premise seems bad from the start. I don''t believe that scrubbing will catch all the types of errors that checksumming will. There are a category of errors that are not caused by firmware, or any type of software. The hardware just doesn''t write or read the correct bit value this time around. With out a checksum there''s no way for the firmware to know, and next time it very well may write or read the correct bit value from the exact same spot on the disk, so scrubbing is not going to flag this sector as ''bad''. Now you may claim that this type of error happens so infrequently that it''s not worth it. You may think so since the number of bits you need to read or write to experience this is huge. However, hard disk sizes are still increasing exponentially, and the data we users are storing on them is too. I don''t believe that the distinctive makers are making corresponding improvements in the bit error rates. Therefore while it may not be a huge benefit today, it''s good we have it today, because it''s value will increase as time goes on, drive sizes and data sizes increase.> Aside from the problems that scrubbing handles (and you need scrubbing even if you have checksums, because scrubbing is what helps you *avoid* data loss rather than just discover it after it''s too late to do anything about it), and aside from problemsAgain I think you''re wrong on the basis for your point. The checksumming in ZFS (if I understand it correctly) isn''t used for only detecting the problem. If the ZFS pool has any redundancy at all, those same checksums can be used to repair that same data, thus *avoiding* the data loss. I agree that scrubbing is still a good idea. but as discussed above it won''t catch (and avoid) all the types of errors that checksumming can catch *and repair*.> deriving from sloppy assembly (which tend to become obvious fairly quickly, though it''s certainly possible for some to be more subtle), checksums primarily catch things like bugs in storage firmware and otherwise undetected disk read errors (which occur orders of magnitude less frequently than uncorrectable read errors). >Sloppy assembly isn''t the only place these errors can occur. it can occur between the head and the platter, even with the best drive and controller firmware.> Robert Milkowski cited some sobering evidence that mid-range arrays may have non-negligible firmware problems that ZFS could often catch, but a) those are hardly ''consumer'' products (to address that sub-thread, which I think is what applies in Stefano''s case) and b) ZFS''s claimed attraction for higher-end (corporate) use is its ability to *eliminate* the need for such products (hence its ability to catch their bugs would not apply - though I can understand why people who needed to use them anyway might like to have ZFS''s integrity checks along for the ride, especially when using less-than-fully-mature firmware). > >Every drive has firmware too. If it can be used to detect and repair array firmware problems, then it can be used by consumers to detect and repair drive firmware problems too.> And otherwise undetected disk errors occur with negligible frequency compared with software errors that can silently trash your data in ZFS cache or in application buffers (especially in PC environments: enterprise software at least tends to be more stable and more carefully controlled - not to mention their typical use of ECC RAM). > >As I wrote above. The undetected disk error rate is not improving (AFAIK) as fast as disk size and data size that these drives are used for. Therefore the value of this protection is increasing all the time. Sure it''s true that something else that could trash your data without checksumming can still trash your data with it. But making sure that the data gets unmangled if it can is still worth something, and the improvements you point out are needed in other components would be pointless (according to your argument) if something like ZFS didn''t also exist.> So depending upon ZFS''s checksums to protect your data in most PC environments is sort of like leaving on a vacation and locking and bolting the back door of your house while leaving the front door wide open: yes, a burglar is less likely to enter by the back door, but thinking that the extra bolt there made you much safer is likely foolish. > > .. are you > >> just trying to say that without multiple copies of >> data in multiple >> physical locations you''re not really accomplishing a >> more complete >> risk reduction >> > > What I''m saying is that if you *really* care about your data, then you need to be willing to make the effort to lock and bolt the front door as well as the back door and install an alarm system: if you do that, *then* ZFS''s additional protection mechanisms may start to become significant (because you''re eliminated the higher-probability risks and ZFS''s extra protection then actually reduces the *remaining* risk by a significant percentage). > >Agreed. Depending on only one copy of your important data is shortsighted. But using a tool like ZFS on at least the most active copy, if not all copies will be an improvement, if it even once stops you from having to go to your other copies. Also it''s interesting that you use the term ''alarm system''. That''s exactly how I view the checksumming features of ZFS. It is an alarm that goes off if any of my bits have been lost to an invisible ''burglar''. I''ve also noticed how you happen to skip the data replication features of ZFS. While they may not be everything you''ve hoped they would be, they are features that will have value to people who want to do exactly what you suggest, keeping multiple copies of their data in multiple places.> Conversely, if you don''t care enough about your data to take those extra steps, then adding ZFS''s incremental protection won''t reduce your net risk by a significant percentage (because the other risks that still remain are so much larger). > > Was my point really that unclear before? It seems as if this must be at least the third or fourth time that I''ve explained it. > >On the cost side of things, I think you also miss a point. The data checking *and repair* features of ZFS bring down the cost of storage not just on the cost of the software. It also allows (as in safeguards) the use of significantly lower priced Hardware (SATA drives instead of SAS or FCAL, or expensive arrays) by making up for the slightly higher possibility of problems that hardware brings with it. This in my opinion fundamentally changes the cost/risk ratio by giving virtually the same or better error rates on the cheaper hardware.> >> i''d love to see >> the improvements on the many shortcomings you''re >> pointing to and >> passionate about written up, proposed, and freely >> implemented :) >> > > Then ask the ZFS developers to get on the stick: fixing the fragmentation problem discussed elsewhere should be easy, and RAID-Z is at least amenable to a redesign (though not without changing the on-disk metadata structures a bit - but while they''re at it, they could include support for data redundancy in a manner analogous to ditto blocks so that they could get rid of the vestigial LVM-style management in that area). > >I think he was suggesting that if it''s so important to you, go ahead and submit the changes yourself. Though I know not all of us have the skills to do that. I''ll admit I don''t. -Kyle
> On Tue, 4 Dec 2007, Stefano Spinucci wrote: > > >>> On 11/7/07, can you guess? > >> billtodd at metrocast.net > >>> wrote: > >> However, ZFS is not the *only* open-source > approach > >> which may allow that to happen, so the real > question > >> becomes just how it compares with equally > inexpensive > >> current and potential alternatives (and that would > >> make for an interesting discussion that I''m not > sure > >> I have time to initiate tonight). > >> > >> - bill > > > > Hi bill, only a question: > > I''m an ex linux user migrated to solaris for zfs > and its checksumming; you say there are other > open-source alternatives but, for a linux end user, > I''m aware only of Oracle btrfs > (http://oss.oracle.com/projects/btrfs/), who is a > Checksumming Copy on Write Filesystem not in a final > state. > > > > what *real* alternatives are you referring to??? > > > > if I missed something tell me, and I''ll happily > stay with linux with my data checksummed and > snapshotted. > > > > bye > > > > --- > > Stefano Spinucci > > > > Hi Stefano, > > Did you get a *real* answer to your question? > Do you think that this (quoted) message is a *real* > answer?Hi, Al - I see that you''re still having difficulty understanding basic English, and your other recent technical-content-free drivel here suggests that you might be better off considering a career in janitorial work than in anything requiring even basic analytical competence. But I remain willing to help you out with English until you can find the time to take a remedial course (though for help with finding a vocation more consonant with your abilities you''ll have to look elsewhere). Let''s begin by repeating the question at issue, since failing to understand that may be at the core of your problem: "what *real* alternatives are you referring to???" Despite a similar misunderstanding by your equally-illiterate associate Mr. Cook, that was not a question about what alternatives provided the specific support in which Stefano was particularly interested (though in another part of my response to him I did attempt to help him understand why that interest might be misplaced). Rather, it was a question about what *I* had referred to in an earlier post of mine, as you might also have gleaned from the first sentence of my response to that question ("As I said in the post to which you responded...") had what passes for your brain been even minimally engaged when you read it. My response to that question continued by listing some specific features (snapshots, disk scrubbing, software RAID) available in Linux and Free BSD that made them viable alternatives to ZFS for enterprise use (the context of that earlier post that I was being questioned about). Whether Linux and FreeBSD also offer management aids I admitted I didn''t know - though given ZFS''s own limitations in this area such as the need to define mirror pairs and parity groups explicitly and the inability to expand parity groups it''s not clear that lack thereof would constitute a significant drawback (especially since the management activities that their file systems require are comparable to what such enterprise installations are already used to dealing with). And, in an attempt to forestall yet another round of babble, I then addressed the relative importance (or lack thereof) of several predictable "Yes, but ZFS also offers wonderful feature X..." responses. Now, not being a psychic myself, I can''t state with authority that Stefano really meant to ask the question that he posed rather than something else. In retrospect, I suppose that some of his surrounding phrasing *might* suggest that he was attempting (however unskillfully) to twist my comment about other open source solutions being similarly enterprise-capable into a provably-false assertion that those other solutions offered the *same* features that he apparently considers so critical in ZFS rather than just comparably-useful ones. But that didn''t cross my mind at the time: I simply answered the question that he asked, and in passing also pointed out that those features which he apparently considered so critical might well not be. Once again, though, I''ve reached the limit of my ability to dumb down the discussion in an attempt to reach your level: if you still can''t grasp it, perhaps a friend will lend a hand. - bill This message posted from opensolaris.org
On Wed, 5 Dec 2007, can you guess? wrote: .... snip .... reformatted .....> Changing ZFS''s approach to snapshots from block-oriented to > audit-trail-oriented, in order to pave the way for a journaled > rather than shadow-paged approach to transactional consistency > (which then makes data redistribution easier to allow rebalancing > across not only local disks but across multiple nodes using > algorithmic rather than pointer-based placement) starts to get more > into a ''raze it to the ground and start over'' mode, though - leaving > plenty of room for one or more extended postscripts to ''the last > word in file systems''. > > - bill >Beep; Beep; Beep, Beep, Beep, beep beep beep beep-beep-beep Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
On Wed, 5 Dec 2007, Al Hopper wrote:> On Wed, 5 Dec 2007, Eric Haycraft wrote: > > [... reformatted .... ] > >> Why are we still feeding this troll? Paid trolls deserve no response and >> there is no value in continuing this thread. (And no guys, he isn''t being >> paid by NetApp.. think bigger) The troll will continue to try to downplay >> features of zfs and the community will counter...and on and on. > > +1 - a troll > > Ques: does it matter why he''s a troll? > I don''t think so.... but my best guess is that Bill is out of work, and, due > to the financial hardship, has had to cut his alzheimer''s > medication dosage in half. > > I could be wrong, with my guess, but as long as I keep seeing this "can you > guess?" question, I feel compelled to answer it. :) > > Please feel free to offer your best "can you guess?" answer! > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 > http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ > Graduate from "sugar-coating school"? Sorry - I never attended! :) >Followup: Don''t you hate it when you have to followup your own email! I forgot to include the reference info that backs up my "best guess". Ref: http://www.alz.org/alzheimers_disease_what_is_alzheimers.asp Quote: "Alzheimer''s destroys brain cells, causing problems with memory, thinking and behavior severe enough to affect work, lifelong hobbies or social life. Alzheimer''s gets worse over time, and it is fatal." Quote: "Is the most common form of dementia, a general term for the loss of memory and other intellectual abilities serious enough to interfere with daily life." Quote: "Just like the rest of our bodies, our brains change as we age. Most of us notice some slowed thinking and occasional problems remembering certain things. However, serious memory loss, confusion and other major changes in the way our minds work are not a normal part of aging. They may be a sign that brain cells are failing." Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
> > I have budget constraints then I can use only > user-level storage. > > > > until I discovered zfs I used subversion and git, > but none of them is designe > > d to manage gigabytes of data, some to be > versioned, some to be unversioned. > > > > I can''t afford silent data corruption and, if the > final response is "*now* th > > ere is no *real* opensource software alternative to > zfs automatic checksummin > > g and simple snapshotting" I''ll be an happy solaris > user (for data storage), > > an happy linux user (for everyday work), and an > unhappy offline windows user > > (for some video-related activity I can''t do with > linux). > > Note that I don''t wish to argue for/against > zfs/billtodd but > the comment above about "no *real* opensource > software > alternative zfs automating checksumming and simple > snapshotting" caught my eye. > > There is an open source alternative for archiving > that works > quite well. venti has been available for a few years > now. > It runs on *BSD, linux, macOS & plan9 (its native > os). It > uses strong crypto checksums, stored separately from > the data > (stored in the pointer blocks) so you get a similar > guarantee > against silent data corruption as ZFS. > > You can back up a variety of filesystems (ufs, hfs, > ext2fs, > fat) or use it to to backup a file tree. Each backup > results > in a single 45 byte "score" containing the checksum > of root > pointer block. Using this score you can retrieve the > entire > backup. Further, it stores only one copy of a data > block > regardless of what files or which backup it may > belong to. In > effect every "full backup" is an incremental backup > (only > changed data blocks and changed or new ptr blocks are > stored). > > So it is really an "archival" server. You don''t take > snapshots but you do a backup. However you can nfs > mount a > venti and all your backups will show up under > directories > like <machine>/<yyyy>/<mm><dd>/<filesystem>. > > Ideally you''d store a venti on RAID storage. You can > even > copy a bunch of venti to another one, you can store > its > arenas on CDs or DVD and so on. > > It is not as fast as ZFS nor anywhere near as easy to > use and > its intended use is not the same as ZFS (not a > primary > filesystem). But for what it does, it is not bad at > all! > > Unlike ZFS, it fits best where you have a fast > filesystem for > speed critical use, venti for backups and RAID for > redundancy. > > Google for "venti sean dorward". If interested, go > to > http://swtch.com/plan9port/ and pick up plan9port (a > collection of programs from plan9, not just venti). > See > ttp://swtch.com/plan9port/man/man8/index.html for how > to use > venti.thank you for the suggestion. after reading something about venti I like its features and its frugality (no fuss, no hype, only a reliable fs). however, having touched zfs before venti, I admit I like zfs more and furthermore this give me a reason to use opensolaris and maybe tomorrow dump linux entirely. I''d like to have time to play with plan9port and maybe also with inferno, but for now the descent can wait.> > I think for every fully digital people own data are > vital, and almost everyon > > e would reply "NONE" at your question "what level > of risk user is willing to > > tolerate". > > NONE is not possible. It is a question of how much > risk you > are willing to tolerate for what cost. Thankfully, > these > days you have a variety of choices and much much > lower cost > for a given degree of risk compared to just a few > years ago!I know no risk is impossible, but a checksumming fs with snapshots (mirrored on two disks used alternatively) is a good compromise for me (a professional-home user, with data I can''t -or I''d like not to- loose). bye --- Stefano Spinucci This message posted from opensolaris.org
Literacy has nothing to do with the glaringly obvious BS you keep spewing. Rather than answer a question, which couldn''t be answered, because you were full of it, you tried to convince us all he really didn''t know what he wanted. The assumption sure made an a$$ out of someone, but you should be used to painting yourself into a corner by now. There aren''t free alternatives in linux or freebsd that do what zfs does, period. You can keep talking in circles till you''re blue in the face, or I suppose your fingers go numb in this case, but the fact isn''t going to change. Yes, people do want zfs for any number of reasons, that''s why they''re here. You would think the fact zfs was ported to freebsd so quickly would''ve been a good first indicator that the functionality wasn''t already there. Then again, the glaringly obvious seems to consistently bypass you. I''m guessing it''s because there''s no space left in the room... your head is occupying any and all available. Nevermind, your ability to admit when you''re wrong is only rivaled by your petty attempts at insults. If you''d like to answer stephans question, feel free. If all you can muster is a Microsoftesque "you don''t really know what you want", I suggest giving up now. This message posted from opensolaris.org
apologies in advance for prolonging this thread .. i had considered taking this completely offline, but thought of a few people at least who might find this discussion somewhat interesting .. at the least i haven''t seen any mention of Merkle trees yet as the nerd in me yearns for On Dec 5, 2007, at 19:42, bill todd - aka can you guess? wrote:>> what are you terming as "ZFS'' incremental risk reduction"? .. >> (seems like a leading statement toward a particular assumption) > > Primarily its checksumming features, since other open source > solutions support simple disk scrubbing (which given its ability to > catch most deteriorating disk sectors before they become unreadable > probably has a greater effect on reliability than checksums in any > environment where the hardware hasn''t been slapped together so > sloppily that connections are flaky).ah .. okay - at first reading "incremental risk reduction" seems to imply an incomplete approach to risk .. putting various creators and marketing organizations pride issues aside for a moment, as a complete risk reduction - nor should it billed as such. However i do believe that an interesting use of the merkle tree with a sha256 hash is somewhat of an improvement over conventional volume based data scrubbing techniques since there can be a unique integration between the hash tree for the filesystem block layout and a hierarchical data validation method. In addition to the finding unknown areas with the scrub, you''re also doing relatively inexpensive data validation checks on every read.> Aside from the problems that scrubbing handles (and you need > scrubbing even if you have checksums, because scrubbing is what > helps you *avoid* data loss rather than just discover it after it''s > too late to do anything about it), and aside from problems deriving > from sloppy assembly (which tend to become obvious fairly quickly, > though it''s certainly possible for some to be more subtle), > checksums primarily catch things like bugs in storage firmware and > otherwise undetected disk read errors (which occur orders of > magnitude less frequently than uncorrectable read errors).sure - we''ve seen many transport errors, as well as firmware implementation errors .. in fact with many arrays we''ve seen data corruption issues with the scrub (particularly if the checksum is singly stored along with the data block) - just like spam you really want to eliminate false positives that could indicate corruption where there isn''t any. if you take some time to read the on disk format for ZFS you''ll see that there''s a tradeoff that''s done in favor of storing more checksums in many different areas instead of making more room for direct block pointers.> Robert Milkowski cited some sobering evidence that mid-range arrays > may have non-negligible firmware problems that ZFS could often > catch, but a) those are hardly ''consumer'' products (to address that > sub-thread, which I think is what applies in Stefano''s case) and b) > ZFS''s claimed attraction for higher-end (corporate) use is its > ability to *eliminate* the need for such products (hence its > ability to catch their bugs would not apply - though I can > understand why people who needed to use them anyway might like to > have ZFS''s integrity checks along for the ride, especially when > using less-than-fully-mature firmware).actually on this list we''ve seen a number of consumer level products including sata controllers, and raid cards (which are also becoming more commonplace in the consumer realm) that can be confirmed to throw data errors. Code maturity issues aside, there aren''t very many array vendors that are open-sourcing their array firmware - and if you consider zfs as a feature-set that could function as a multi- purpose storage array (systems are cheap) - i find it refreshing that everything that''s being done under the covers is really out in the open.> And otherwise undetected disk errors occur with negligible > frequency compared with software errors that can silently trash > your data in ZFS cache or in application buffers (especially in PC > environments: enterprise software at least tends to be more stable > and more carefully controlled - not to mention their typical use of > ECC RAM). > > So depending upon ZFS''s checksums to protect your data in most PC > environments is sort of like leaving on a vacation and locking and > bolting the back door of your house while leaving the front door > wide open: yes, a burglar is less likely to enter by the back > door, but thinking that the extra bolt there made you much safer is > likely foolish.granted - it''s not an all-in-one solution, but by combining the merkle tree approach with the sha256 checksum along with periodic data scrubbing - it''s a darn good approach .. particularly since it also tends to cost a lot less than what you might have to pay elsewhere for something you can''t really see inside.> Conversely, if you don''t care enough about your data to take those > extra steps, then adding ZFS''s incremental protection won''t reduce > your net risk by a significant percentage (because the other risks > that still remain are so much larger). > > Was my point really that unclear before? It seems as if this must > be at least the third or fourth time that I''ve explained it.not at all, disasters happen in many ways and forms and one must put in place strategies and protections to deal with as much as you can see - granted you can never cover all your bases and disasters can always find their way through .. but you do seem to be repeating the phrase "incremental protection" recently which i think i take issue with. If you really think about it, everything in life is pretty much incremental (even if the size of the increments might vary widely) - checksums and scrubbing are only a piece of the larger data protection schemes. This should really be used along with snapshots, replication, and backup - but i thought that was a given considering what''s already built into the filesystem and the wealth of other tools we try to share in Solaris. <snip> <snip> <snip> too many problems to address .. too little time --- .je
> what are you terming as "ZFS'' incremental risk reduction"?I''m not Bill, but I''ll try to explain. Compare a system using ZFS to one using another file system -- say, UFS, XFS, or ext3. Consider which situations may lead to data loss in each case, and the probability of each such situation. The difference between those two sets is the ''incremental risk reduction'' provided by ZFS. So, for instance, assuming you''re using ZFS RAID in the first case, and a traditional RAID implementation in the second case: * Single-disk failure ==> same probability of occurrence, no data loss in either case. * Double-disk failure ==> same probability of occurrence, no data loss in either case (assuming RAID6/RAIDZ2; or data loss assuming RAID5/RAIDZ) * Uncorrectable read error ==> same probability of occurrence, no data loss in either case * Single-bit error on the wire ==> same, no data loss in either case * Multi-bit error on the wire, detected by CRC ==> same, no data loss * Multi-bit error on the wire ==> This is the first interesting case (since it differs). This is a case where ZFS will correct the error, and the standard RAID will not. The probability of occurrence is hard to compute, since it depends on the distribution of bit errors on the wire, which aren''t really independent. Roughly, though, since the wire transfers usually use a 32-bit CRC, the probability of an undetected error is 2^-32, or 0.000 000 023 2%. [You could ask whether this is true for real data. It appears to be; see "Performance of Checksums and CRCs over Real Data" by Stone, Greenwald, Partridge & Hughes. ] * Error in the file system code ==> Another interesting case, but we don''t have sufficient data to gauge probabilities. * Undetected error in host memory ==> same probability of occurrence, same data loss. * Undetected error in RAID memory ==> same probability, but data loss in non-ZFS case. We can estimate the probability of this, but I don''t have current data. Single-bit errors were measured at a rate of 2*10^-12 on a number of systems in the mid-1990s (see "Single Event Upset at Ground Level" by Eugene Normand). If the bits are separated spatially (as is normally done), the probability of a double-bit error is roughly 4*10^-24, and of a triple-bit error, 8*10^-36. So an undetected error is very, VERY unlikely, at least from RAM cell effects. But ZFS can correct it, if it happens. * Failure of facility (e.g. fire, flood, power surge) ==> same/total loss of data. [ Total loss if you don''t have a backup, of course. ] ... go on as desired. This message posted from opensolaris.org
> Now, not being a psychic myself, I can''t state with > authority that Stefano really meant to ask the > question that he posed rather than something else. > In retrospect, I suppose that some of his > surrounding phrasing *might* suggest that he was > attempting (however unskillfully) to twist my > comment about other open source solutions being > similarly enterprise-capable into a provably-false > assertion that those other solutions offered the > *same* features that he apparently considers so > critical in ZFS rather than just comparably-useful > ones. But that didn''t cross my mind at the time: I > simply answered the question that he asked, and in > passing also pointed out that those features which > he apparently considered so critical might well not > be.dear bill, my question was honest and, as I stated before: I''m a linux user who discovered zfs and ''d like to use it to store (versioned and checksummed) valuable data. then, if there are no alternatives to zfs, I''d gladly stick with it, and unless you have a *better* solution (repeat with me: important data, 1 laptop, three disks), please don''t use further my name for your guessing of an hidden plot to discover the (evident) bias of your messages. thanks --- Stefano Spinucci This message posted from opensolaris.org
On Dec 6, 2007, at 00:03, Anton B. Rang wrote:>> what are you terming as "ZFS'' incremental risk reduction"? > > I''m not Bill, but I''ll try to explain. > > Compare a system using ZFS to one using another file system -- say, > UFS, XFS, or ext3. > > Consider which situations may lead to data loss in each case, and > the probability of each such situation. > > The difference between those two sets is the ''incremental risk > reduction'' provided by ZFS.ah .. thanks Anton - so the next step would be to calculate the probability of occurrence, the impact to operation, and the return to service for each anticipated risk in a given environment in order to determine the size of the increment that constitutes the risk reduction that ZFS is providing. Without this there''s just a lot of hot air blowing around in here .. <snip> excellent summary of risks - perhaps we should also consider the availability and transparency of the code to potentially mitigate future problems .. that''s currently where i''m starting to see tremendous value in open and free raid controller solutions to help drive down the cost of implementation for this sort of data protection instead of paying through the nose for a closed hardware based solutions (which is still a great margin in licensing for dedicated storage vendors) --- .je
> Literacy has nothing to do with the glaringly obvious > BS you keep spewing.Actually, it''s central to the issue: if you were capable of understanding what I''ve been talking about (or at least sufficiently humble to recognize the depths of your ignorance), you''d stop polluting this forum with posts lacking any technical content whatsoever. Rather than answer a question,> which couldn''t be answered,The question that was asked was answered - it''s hardly my problem if you could not competently parse the question, or the answer, or the subsequent explanation (though your continuing drivel after those three strikes suggests that you may simply be ineducable). because you were full of> it, you tried to convince us all he really didn''t > know what he wanted.No: I answered his question and *also* observed that he probably really didn''t know what he wanted (at least insofar as being able to *justify* the intensity of his desire for it). ...> There aren''t free alternatives in linux or freebsd > that do what zfs does, period.No one said that there were: the real issue is that there''s not much reason to care, since the available solutions don''t need to be *identical* to offer *comparable* value (i.e., they each have different strengths and weaknesses and the net result yields no clear winner - much as some of you would like to believe otherwise). You can keep talking> in circles till you''re blue in the face, or I suppose > your fingers go numb in this case, but the fact isn''t > going to change. Yes, people do want zfs for any > number of reasons, that''s why they''re here.Indeed, but it has become obvious that most of the reasons are non-technical in nature. This place is fanboy heaven, where never is heard a discouraging word (and you''re hip-deep in buffalo sh!t). Hell, I came here myself 18 months ago because ZFS seemed interesting, but found out that the closer I looked, the less interesting it got. Perhaps it''s not surprising that so many of you never took that second step: it does require actual technical insight, which seems to be in extremely short supply here. So short that it''s not worth spending time here from any technical standpoint: at this point I''m mostly here for the entertainment, and even that is starting to get a little tedious. - bill This message posted from opensolaris.org
> Actually, it''s central to the issue: if you were > capable of understanding what I''ve been talking about > (or at least sufficiently humble to recognize the > depths of your ignorance), you''d stop polluting this > forum with posts lacking any technical content > whatsoever.I don''t speak "full of myself", apparently nobody else here does either, because nobody has a clue what you continue to ramble about.> The question that was asked was answered - it''s > hardly my problem if you could not competently parse > the question, or the answer, or the subsequent > explanation (though your continuing drivel after > those three strikes suggests that you may simply be > ineducable).Except nobody but you seems to be able to acertain any sort of answer from your rambling response. The question was simple, as would an adequate answer. You either aren''t "literate" enough to understand the question, or you''re wrong. It''s clearly the latter.> No: I answered his question and *also* observed that > he probably really didn''t know what he wanted (at > least insofar as being able to *justify* the > intensity of his desire for it).Funny, the original poster, and everyone else disagrees with you. But with such visions of granduer, I suppose we''re all just wrong.> No one said that there were: the real issue is that > there''s not much reason to care, since the available > solutions don''t need to be *identical* to offer > *comparable* value (i.e., they each have different > strengths and weaknesses and the net result yields no > clear winner - much as some of you would like to > believe otherwise). >Right, so yet again, you were wrong. Stop telling us what you think we need. Stop trying to impose your arrogant ASSumptions onto us. WE don''t care what YOU think WE need.> Indeed, but it has become obvious that most of the > reasons are non-technical in nature. This place is > fanboy heaven, where never is heard a discouraging > word (and you''re hip-deep in buffalo sh!t).There you go. You heard it here first folks. Anyone who doesn''t agree with bill is a fanboy.> > Hell, I came here myself 18 months ago because ZFS > seemed interesting, but found out that the closer I > looked, the less interesting it got. Perhaps it''s > not surprising that so many of you never took that > second step: it does require actual technical > insight, which seems to be in extremely short supply > here. >So leave.> So short that it''s not worth spending time here from > any technical standpoint: at this point I''m mostly > here for the entertainment, and even that is starting > to get a little tedious. > > - billOh bill, I think we both know your ego won''t be able to stop without being banned or getting the *last word*. Unfortunately you bring nothing to the table but arrogance, which hasn''t, and isn''t getting you very far. Keep up the good work though. Are you getting paid by word count, or by post? I''m guessing word count given the long winded content void responses. This message posted from opensolaris.org
> > Now, not being a psychic myself, I can''t state > with > > authority that Stefano really meant to ask the > > question that he posed rather than something else. > > In retrospect, I suppose that some of his > > surrounding phrasing *might* suggest that he was > > attempting (however unskillfully) to twist my > > comment about other open source solutions being > > similarly enterprise-capable into a provably-false > > assertion that those other solutions offered the > > *same* features that he apparently considers so > > critical in ZFS rather than just comparably-useful > > ones. But that didn''t cross my mind at the time: > I > simply answered the question that he asked, and in > passing also pointed out that those features which > he apparently considered so critical might well not > be. > dear bill, > my question was honestThat''s how I originally accepted it, and I wouldn''t have revisited the issue looking for other interpretations if two people hadn''t obviously thought it meant something else. For that matter, even if you actually intended it to mean something else that doesn''t imply that there was any devious intent. In any event, what you actually asked was what I had referred to, and I told you: it may not have met your personal goals for your own storage, but that wasn''t relevant to the question that you asked (and that I answered). Your English is so good that the possibility that it might be a second language had not occurred to me - but if so it would help explain any subtle miscommunication. ...> if there are no alternatives to zfs,As I explained, there are eminently acceptable alternatives to ZFS from any objective standpoint. I''d gladly> stick with it,And you''re welcome to, without any argument from me - unless you try to convince other people that there are strong technical reasons to do so, in which case I''ll challenge you to justify them in detail so that any hidden assumptions can be brought out into the open. - bill This message posted from opensolaris.org
> I suppose we''re all just wrong.By George, you''ve got it! - bill This message posted from opensolaris.org
can you guess? wrote:> >> There aren''t free alternatives in linux or freebsd >> that do what zfs does, period. >> > > No one said that there were: the real issue is that there''s not much reason to care, since the available solutions don''t need to be *identical* to offer *comparable* value (i.e., they each have different strengths and weaknesses and the net result yields no clear winner - much as some of you would like to believe otherwise). > >I see you carefully snipped "You would think the fact zfs was ported to freebsd so quickly would''ve been a good first indicator that the functionality wasn''t already there." A point you appear keen to avoid discussing. Ian
On Wed, Dec 05, 2007 at 09:45:55PM -0800, can you guess? wrote:> > There aren''t free alternatives in linux or freebsd > > that do what zfs does, period. > > No one said that there were: the real issue is that there''s not much > reason to care, since the available solutions don''t need to beIf you don''t care, then go off not caring. (Can we declare this thread dead already?) Others seem to care.> *identical* to offer *comparable* value (i.e., they each have > different strengths and weaknesses and the net result yields no clear > winner - much as some of you would like to believe otherwise).Interoperability counts for a lot for some people. Fewer filesystems to learn about can count too. ZFS provides peace of mind that you tell us doesn''t matter. And it''s actively developed and you and everyone else can see that this is so, and that recent ZFS improvements and others that are in the pipe (and discussed publicly) are very good improvements, which all portends an even better future for ZFS down the line. Whatever you do not like about ZFS today may be fixed tomorrow, except for the parts about it being ZFS, opensource, Sun-developed, ..., the parts that really seem to bother you.
can you guess? wrote:>> There aren''t free alternatives in linux or freebsd >> that do what zfs does, period. >> > > No one said that there were: the real issue is that there''s not much reason to care, since the available solutions don''t need to be *identical* to offer *comparable* value (i.e., they each have different strengths and weaknesses and the net result yields no clear winner - much as some of you would like to believe otherwise).Ok. So according to you, most of what ZFS does is available elsewhere, and the features it has that nothing else has are'' really a value add, ar least not enough to produce a ''clear winner''. Ok, assume for a second that I believe that. can you list one other software raid/filesystem that as any feature (small or large) that ZFS lacks? Because if all else is really equal, and ZFS is the only one with any advantages then, whether those advantages are small or not (and I don''t agree with how small you think they are - see my other post that you''ve ignored so far.) I think there is a ''clear winner'' - at least at the moment - Things can change at any time. -Kyle
Whoever coined that phrase must''ve been wrong, it should definitely be "By billtodd you''ve got it". This message posted from opensolaris.org
For the same reason he won''t respond to Jone, and can''t answer the original question. He''s not trying to help this list out at all, or come up with any real answers. He''s just here to troll. This message posted from opensolaris.org
> As I explained, there are eminently acceptable > alternatives to ZFS from any objective standpoint. >So name these mystery alternatives that come anywhere close to the protection, functionality, and ease of use that zfs provides. You keep talking about how they exist, yet can''t seem to come up with any real names. Really, a five page dissertation isn''t required. A simple numbered list will be more than acceptable. Although, I think we all know that won''t happen since you haven''t a list to provide. Oh and "I''m sure there''s something out there, I''m just not sure what" DEFINITELY isn''t an acceptable answer. This message posted from opensolaris.org
On Dec 6, 2007 1:13 AM, Bakul Shah <bakul at bitblocks.com> wrote:> Note that I don''t wish to argue for/against zfs/billtodd but > the comment above about "no *real* opensource software > alternative zfs automating checksumming and simple > snapshotting" caught my eye. > > There is an open source alternative for archiving that works > quite well. venti has been available for a few years now. > It runs on *BSD, linux, macOS & plan9 (its native os). It > uses strong crypto checksums, stored separately from the data > (stored in the pointer blocks) so you get a similar guarantee > against silent data corruption as ZFS.Last time I looked into Venti, it used content hashing to locate storage blocks. Which was really cool, because (as you say) it magically consolidates blocks with the same checksum together. The 45 byte score is the checksum of the top of the tree, isn''t that right? Good to hear it''s still alive and been revamped somewhat. ZFS snapshots and clones save a lot of space, but the ''content-hash == address'' trick means you could potentially save much more. Though I''m still not sure how well it scales up - Bigger working set means you need longer (more expensive) hashes to avoid a collision, and even then its not guaranteed. When i last looked they were still using SHA-160 and I ran away screaming at that point :)> Google for "venti sean dorward". If interested, go to > http://swtch.com/plan9port/ and pick up plan9port (a > collection of programs from plan9, not just venti). See > http://swtch.com/plan9port/man/man8/index.html for how to use > venti.-- Rasputnik :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
zfs-discuss-bounces at opensolaris.org wrote on 12/06/2007 09:58:00 AM:> On Dec 6, 2007 1:13 AM, Bakul Shah <bakul at bitblocks.com> wrote: > > > Note that I don''t wish to argue for/against zfs/billtodd but > > the comment above about "no *real* opensource software > > alternative zfs automating checksumming and simple > > snapshotting" caught my eye. > > > > There is an open source alternative for archiving that works > > quite well. venti has been available for a few years now. > > It runs on *BSD, linux, macOS & plan9 (its native os). It > > uses strong crypto checksums, stored separately from the data > > (stored in the pointer blocks) so you get a similar guarantee > > against silent data corruption as ZFS. > > Last time I looked into Venti, it used content hashing to > locate storage blocks. Which was really cool, because (as > you say) it magically consolidates blocks with the same checksum > together. > > The 45 byte score is the checksum of the top of the tree, isn''t that > right? > > Good to hear it''s still alive and been revamped somewhat. > > ZFS snapshots and clones save a lot of space, but the > ''content-hash == address'' trick means you could potentially save > much more. > > Though I''m still not sure how well it scales up - > Bigger working set means you need longer (more expensive) hashes > to avoid a collision, and even then its not guaranteed. > > When i last looked they were still using SHA-160 > and I ran away screaming at that point :)The hash chosen is close to inconsequential as long as you perform collision checks and the collision rate is "low". Hash key collision branching is pretty easy and has been used for decades (see perl''s collision forking for hash var key collisions for an example). The process is lookup a key, verify data matches, if it does inc the ref count store and go, if no match split out a sub key, store and go. There are "cost" curves for both the hashing, and data matching portions. As the number of hash matches goes up so does the cost for data verifying -- but no matter what hash you use (assuming at least one bit less information then the original data) there _will_ be collisions possible so the verify must exist. -Wade
> apologies in advance for prolonging this thread ..Why do you feel any need to? If you were contributing posts as completely devoid of technical content as some of the morons here have recently been submitting I could understand it, but my impression is that the purpose of this forum is to explore the kind of questions that you''re interested in discussing. i> had considered > taking this completely offline, but thought of a few > people at least > who might find this discussion somewhat interestingAnd any who don''t are free to ignore it, so no harm done there either.> .. at the least i > haven''t seen any mention of Merkle trees yet as the > nerd in me yearns > forI''d never heard of them myself until recently, despite having come up with the idea independently to use a checksumming mechanism very similar to ZFS''s. Merkle seems to be an interesting guy - his home page is worth a visit.> > On Dec 5, 2007, at 19:42, bill todd - aka can you > guess? wrote: > > >> what are you terming as "ZFS'' incremental risk > reduction"? .. > >> (seems like a leading statement toward a > particular assumption) > > > > Primarily its checksumming features, since other > open source > > solutions support simple disk scrubbing (which > given its ability to > > catch most deteriorating disk sectors before they > become unreadable > > probably has a greater effect on reliability than > checksums in any > > environment where the hardware hasn''t been slapped > together so > > sloppily that connections are flaky). > > ah .. okay - at first reading "incremental risk > reduction" seems to > imply an incomplete approach to riskThe intent was to suggest a step-wise approach to risk, where some steps are far more significant than others (though of course some degree of overlap between steps is also possible). *All* approaches to risk are incomplete. ... i do> believe that an interesting use of the merkle tree > with a sha256 hash > is somewhat of an improvement over conventional > volume based data > scrubbing techniquesOf course it is: that''s why I described it as ''incremental'' rather than as ''redundant''. The question is just how *significant* an improvement it offers. since there can be a unique> integration between > the hash tree for the filesystem block layout and a > hierarchical data > validation method. In addition to the finding > unknown areas with the > scrub, you''re also doing relatively inexpensive data > validation > checks on every read.Yup. ...> sure - we''ve seen many transport errors,I''m curious what you mean by that, since CRCs on the transports usually virtually eliminate them as problems. Unless you mean that you''ve seen many *corrected* transport errors (indicating that the CRC and retry mechanisms are doing their job and that additional ZFS protection in this area is probably redundant). as well as> firmware > implementation errorsQuantitative and specific examples are always good for this kind of thing; the specific hardware involved is especially significant to discussions of the sort that we''re having (given ZFS''s emphasis on eliminating the need for much special-purpose hardware). .. in fact with many arrays> we''ve seen data > corruption issues with the scrubI''m not sure exactly what you''re saying here: is it that the scrub has *uncovered* many apparent instances of data corruption (as distinct from, e.g., merely unreadable disk sectors)? (particularly if the> checksum is > singly stored along with the data block)Since (with the possible exception of the superblock) ZFS never stores a checksum ''along with the data block'', I''m not sure what you''re saying there either. - just like> spam you really > want to eliminate false positives that could indicate > corruption > where there isn''t any.The only risk that ZFS''s checksums run is the infinitesimal possibility that corruption won''t be detected, not that they''ll return a false positive. if you take some time to read> the on disk > format for ZFS you''ll see that there''s a tradeoff > that''s done in > favor of storing more checksums in many different > areas instead of > making more room for direct block pointers.While I haven''t read that yet, I''m familiar with the trade-off between using extremely wide checksums (as ZFS does - I''m not really sure why, since cryptographic-level security doesn''t seem necessary in this application) and limiting the depth of the indirect block tree. But (yet again) I''m not sure what you''re trying to get at here. ... on this list we''ve seen a number of consumer> level products > including sata controllers, and raid cards (which are > also becoming > more commonplace in the consumer realm) that can be > confirmed to > throw data errors.Your phrasing here is a bit unusual (''throwing errors'' - or exceptions - is not commonly related to corrupting data). If you''re referring to some kind of silent data corruption, once again specifics are important: to put it bluntly, a lot of the people here are only semi-competent technically and appear to have a personal interest in finding ways to justify their enthusiasm for ZFS, thus their anecdotal reports require scrutiny (especially since the only formal studies that I''m familiar with in this area don''t seem to reflect their reported experiences). Code maturity issues aside, there> aren''t very > many array vendors that are open-sourcing their array > firmware - and > if you consider zfs as a feature-set that could > function as a multi- > purpose storage array (systems are cheap) - i find it > refreshing that > everything that''s being done under the covers is > really out in the open.As, of course, is equally the case with other open-source software solutions that offer similar reliability: ZFS is not exactly anything new in this respect, save for its specific checksumming mechanism (the incremental value of which is the question here).> > > And otherwise undetected disk errors occur with > negligible > > frequency compared with software errors that can > silently trash > > your data in ZFS cache or in application buffers > (especially in PC > > environments: enterprise software at least tends > to be more stable > > and more carefully controlled - not to mention > their typical use of > > ECC RAM). > > > > So depending upon ZFS''s checksums to protect your > data in most PC > > environments is sort of like leaving on a vacation > and locking and > > bolting the back door of your house while leaving > the front door > > wide open: yes, a burglar is less likely to enter > by the back > > door, but thinking that the extra bolt there made > you much safer is > > likely foolish. > > granted - it''s not an all-in-one solution, but by > combining the > merkle tree approach with the sha256 checksum along > with periodic > data scrubbing - it''s a darn good approach ..As are the other open-source solutions that offer similar reliability: you can keep touting ZFS''s checksums until the cows come home, but unless you *quantify* their incremental value (as Anton just took a stab at doing elsewhere) it''s just more fanboy babble (which again may seem a bit blunt, but when you keep asserting that something is significantly valuable without ever being willing to step up to the plate and quantify that value - especially in comparison to other remaining risks - after repeated challenges to do so a little bluntness seems to be in order).> particularly since it > also tends to cost a lot less than what you might > have to pay > elsewhere for something you can''t really see inside.But no more than the other open-source solutions that you *can* see inside. ... you do seem to> be repeating the > phrase "incremental protection" recently which i > think i take issue > with.Then I hope that the above has helped you understand it better. ... checksums and scrubbing are only a piece of> the larger data > protection schemes.And a piece whose utility is (I''ll say it again) "quantifiable". This should really be used along> with snapshots, > replication, and backupOnly if the checksums contribute significantly to the overall result - otherwise, they''re just another ''nice to have, all other things being equal'' feature. Waving one''s hands about snapshots, replication, and backup is not enough: you have to quantify how much each adds to the overall level of protection and then see how much *more* a feature like ZFS''s checksums reduces the residual risk. Do you have more than one backup copy, and do you compare each with the original after creating it, and are at least some of them off site, and do you periodically verify that their contents are readable? If any of those is not true, then your residual risk probably won''t be markedly reduced by the existence of ZFS-style checksums. - bill This message posted from opensolaris.org
> can you guess? wrote: > > > >> There aren''t free alternatives in linux or freebsd > >> that do what zfs does, period. > >> > > > > No one said that there were: the real issue is > that there''s not much reason to care, since the > available solutions don''t need to be *identical* to > offer *comparable* value (i.e., they each have > different strengths and weaknesses and the net result > yields no clear winner - much as some of you would > like to believe otherwise). > > > > > I see you carefully snipped "You would think the fact > zfs was ported to > freebsd so quickly would''ve been a good first > indicator that the > functionality wasn''t already there." A point you > appear keen to avoid > discussing.Hmmm - do I detect yet another psychic-in-training here? Simply ignoring something that one considers irrelevant does not necessarily imply any active desire to *avoid* discussing it. I suspect that whoever ported ZFS to FreeBSD was a fairly uncritical enthusiast just as so many here appear to be (and I''ll observe once again that it''s very easy to be one, because ZFS does sound impressive until you really begin looking at it closely). Not to mention the fact that open-source operating systems often gather optional features more just because they can than because they necessarily offer significant value: all it takes is one individual who (for whatever reason) feels like doing the work. Linux, for example, is up to its ears in file systems, all of which someone presumably felt it worthwhile to introduce there. Perhaps FreeBSD proponents saw an opportunity to narrow the gap in this area (especially since incorporating ZFS into Linux appears to have licensing obstacles). In any event, the subject under discussion here is not popularity but utility - *quantifiable* utility - and hence the porting of ZFS to FreeBSD is not directly relevant. - bill This message posted from opensolaris.org
(Can we> declare this thread > dead already?)Many have already tried, but it seems to have a great deal of staying power. You, for example, have just contributed to its continued vitality.> > Others seem to care. > > > *identical* to offer *comparable* value (i.e., they > each have > > different strengths and weaknesses and the net > result yields no clear > > winner - much as some of you would like to believe > otherwise). > > Interoperability counts for a lot for some people.Then you''d better work harder on resolving the licensing issues with Linux.> Fewer filesystems to > earn about can count too.And since ZFS differs significantly from its more conventional competitors, that''s something of an impediment to acceptance.> > ZFS provides peace of mind that you tell us doesn''t > matter.Sure it matters, if it gives that to you: just don''t pretend that it''s of any *objective* significance, because *that* requires actual quantification. And it''s> actively developed and you and everyone else can see > that this is so,Sort of like ext2/3/4, and XFS/JFS (though the latter have the advantage of already being very mature, hence need somewhat less ''active'' development).> and that recent ZFS improvements and others that are > in the pipe (and > discussed publicly) are very good improvements, which > all portends an > even better future for ZFS down the line.Hey, it could even become a leadership product someday. Or not - time will tell.> > Whatever you do not like about ZFS today may be fixed > tomorrow,There''d be more hope for that if its developers and users seemed less obtuse. except> for the parts about it being ZFS, opensource, > Sun-developed, ..., the > parts that really seem to bother you.Specific citations of material that I''ve posted that gave you that impression would be useful: otherwise, you just look like another self-professed psychic (is this a general characteristic of Sun worshipers, or just of ZFS fanboys?). - bill This message posted from opensolaris.org
> The 45 byte score is the checksum of the top of the tree, isn''t that > right?Yes. Plus an optional label.> ZFS snapshots and clones save a lot of space, but the > ''content-hash == address'' trick means you could potentially save > much more.Especially if you carry around large files (disk images, databases) that change.> Though I''m still not sure how well it scales up - > Bigger working set means you need longer (more expensive) hashes > to avoid a collision, and even then its not guaranteed.> When i last looked they were still using SHA-160 > and I ran away screaming at that point :)You need 2^80 blocks for a 50%+ probability that a pair will have the same SHA-160 hash (by the birthday paradox). Crypto attacks are not relevant. For my personal use I am willing to live with these odds until my backups cross 2^40 distinct blocks (greater than 8 Petabytes)!
STILL haven''t given us a list of these filesystems you say match what zfs does. STILL coming back with long winded responses with no content whatsoever to try to divert the topic at hand. And STILL making incorrect assumptions. This message posted from opensolaris.org
> can you guess? wrote: > >> There aren''t free alternatives in linux or freebsd > >> that do what zfs does, period. > >> > > > > No one said that there were: the real issue is > that there''s not much reason to care, since the > available solutions don''t need to be *identical* to > offer *comparable* value (i.e., they each have > different strengths and weaknesses and the net result > yields no clear winner - much as some of you would > like to believe otherwise). > Ok. So according to you, most of what ZFS does is > available elsewhere, > and the features it has that nothing else has are'' > really a value add, > ar least not enough to produce a ''clear winner''. Ok, > assume for a second > that I believe that.Unlike so many here I don''t assume things lightly - and this one seems particularly shaky. can you list one other software> raid/filesystem > that as any feature (small or large) that ZFS lacks?Well, duh.> > Because if all else is really equal, and ZFS is the > only one with any > advantages then, whether those advantages are small > or not (and I don''t > agree with how small you think they are - see my > other post that you''ve > ignored so far.)Sorry - I do need to sleep sometimes. But I''ll get right to it, I assure you (or at worst soon: time has gotten away from me again and I''ve got an appointment to keep this afternoon). I think there is a ''clear winner'' -> at least at the > moment - Things can change at any time.You don''t get out much, do you? How does ZFS fall short of other open-source competitors (I''ll limit myself to them, because once you get into proprietary systems - and away from the quaint limitations of Unix file systems - the list becomes utterly unmanageable)? Let us count the ways (well, at least the ones that someone as uninformed as I am about open-source features can come up with off the top of his head), starting in the volume-management arena: 1. RAID-Z, as I''ve explained elsewhere, is brain-damaged when it comes to effective disk utilization for small accesses (especially reads): RAID-5 offers the same space efficiency with N times the throughput for such workloads (used to be provided by mdadm on Linux unless the Linux LVM now supports it too). 2. DRDB on Linux supports remote replication (IIRC there was an earlier, simpler mechanism that also did). 3. Can you yet shuffle data off a disk such that it can be removed from a zpool? LVM on Linux supports this. 4. Last I knew, you couldn''t change the number of disks in a RAID-Z stripe at all, let alone reorganize existing stripe layout on the fly. Typical hardware RAIDs can do this and I thought that Linux RAID support could as well, but can''t find verification now - so I may have been remembering a proposed enhancement. And in the file system arena: 5. No user/group quotas? What *were* they thinking? The discussions about quotas here make it clear that per-filesystem quotas are not an adequate alternative: leaving aside the difficulty of simulating both user *and* group quotas using that mechanism, using it raises mount problems when very large numbers of users are involved, plus hard-link and NFS issues crossing mount points. 6. ZFS''s total disregard of on-disk file contiguity can torpedo sequential-access performance by well over a decimal order of magnitude in situations where files either start out severely fragmented (due to heavily parallel write activity during their population) or become so due to fine-grained random updates. 7. ZFS''s full-path COW approach increases the space overhead of snapshots compared with conventional file systems. 8. Not available on Linux. Damn - I''ve got to run. Perhaps others more familiar with open-source alternatives will add to this list while I''m out. - bill This message posted from opensolaris.org
...> ZFS snapshots and clones save a lot of space, but the > ''content-hash == address'' trick means you could > potentially save > much more.Several startups have emerged over the past few years based on this idea of ''data deduplication'', and some have been swallowed up by bigger fish that clearly think it''s promising. But almost all such efforts have focused on backup/archive data rather than on primary disk storage - the only example of the latter that I''ve noticed is that NetApp started supporting deduplication for primary disk storage about 6 months ago. - bill This message posted from opensolaris.org
> So name these mystery alternatives that come anywhere > close to the protection,If you ever progress beyond counting on your fingers you might (with a lot of coaching from someone who actually cares about your intellectual development) be able to follow Anton''s recent explanation of this (given that the higher-level overviews which I''ve provided apparently flew completely over your head).> functionality,I discussed that in detail elsewhere here yesterday (in more detail than previously in an effort to help the slower members of the class keep up). and ease of> useThat actually may be a legitimate (though hardly decisive) ZFS advantage: it''s too bad its developers didn''t extend it farther (e.g., by eliminating the vestiges of LVM redundancy management and supporting seamless expansion to multi-node server systems). - bill This message posted from opensolaris.org
> If you ever progress beyond counting on your fingers > you might (with a lot of coaching from someone who > actually cares about your intellectual development) > be able to follow Anton''s recent explanation of this > (given that the higher-level overviews which I''ve > provided apparently flew completely over your head).Seriously, are you 14? NOTHING anton listed takes the place of ZFS, and your pie in the sky theories do not a product make. So yet again, your long winded insult ridden response can be translated to "My name is billtodd, I haven''t a fucking clue, I''m wrong, so I''ll defer to my usual defensive tactics of attempting to insult those who know more, and have more experience in the REAL WORLD than I". You do a GREAT job of spewing theory, you do a piss poor job of relating ANYTHING to the real world.> I discussed that in detail elsewhere here yesterday > (in more detail than previously in an effort to help > the slower members of the class keep up).No, no you didn''t. You listed of a couple of bullshit products that don''t come anywhere near the features of ZFS. Let''s throw out a bunch of half-completed projects that require hours of research just to setup, much less integrate, and call it done. MDADM, next up you''ll tell us we really never needed to move beyond fat, because hey, that really was *good enough* too! But of course, your usual "well I haven''t really used the product, but I have read up on it" excuse must be a get-out-of jail free card, AMIRITE?!> > and ease of > use > > That actually may be a legitimate (though hardly > decisive) ZFS advantage: it''s too bad its developers > didn''t extend it farther (e.g., by eliminating the > vestiges of LVM redundancy management and supporting > seamless expansion to multi-node server systems). > > - billRight, they definitely shouldn''t have released zfs because every feature they ever plan on implementing wasn''t there yet. Tell you what, why don''t you try using some of these products you''ve "read about", then you can come back and attempt to continue this discussion. I don''t care what your *THEORIES* are, I care about how things work here in the real world. Better yet, you get back to writing that file system that''s going to fix all these horrible deficiencies in zfs. Then you can show the world just how superior you are. *RIGHT* This message posted from opensolaris.org
I believe the data "dedup" is also a feature of NTFS. -- Darren J Moffat
Darren, Do you happen to have any links for this? I have not seen anything about NTFS and CAS/dedupe besides some of the third party apps/services that just use NTFS as their backing store. Thanks! Wade Stuart Fallon Worldwide P: 612.758.2660 C: 612.877.0385 zfs-discuss-bounces at opensolaris.org wrote on 12/07/2007 12:44:31 PM:> I believe the data "dedup" is also a feature of NTFS. > > -- > Darren J Moffat > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Darren J Moffat
2007-Dec-07 19:23 UTC
[zfs-discuss] OT: NTFS Single Instance Storage (Re: Yager on ZFS
Wade.Stuart at fallon.com wrote:> Darren, > > Do you happen to have any links for this? I have not seen anything > about NTFS and CAS/dedupe besides some of the third party apps/services > that just use NTFS as their backing store.Single Instance Storage is what Microsoft uses to refer to this: http://research.microsoft.com/sn/Farsite/WSS2000.pdf -- Darren J Moffat
Wade.Stuart at fallon.com
2007-Dec-07 19:37 UTC
[zfs-discuss] OT: NTFS Single Instance Storage (Re: Yager on ZFS
Thanks Darren. I found another link that goes into the 2003 implementation: http://blogs.technet.com/filecab/archive/tags/Single+Instance+Store+_2800_SIS_2900_/default.aspx It looks pretty nice, although I am not sure about the userland dedup service design -- I would like to see it implemented closer to the fs and dealing with blocks instead of files. zfs-discuss-bounces at opensolaris.org wrote on 12/07/2007 01:23:22 PM:> Wade.Stuart at fallon.com wrote: > > Darren, > > > > Do you happen to have any links for this? I have not seenanything> > about NTFS and CAS/dedupe besides some of the third party apps/services > > that just use NTFS as their backing store. > > Single Instance Storage is what Microsoft uses to refer to this: > > http://research.microsoft.com/sn/Farsite/WSS2000.pdf > > > -- > Darren J Moffat > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Once again, profuse apologies for having taken so long (well over 24 hours by now - though I''m not sure it actually appeared in the forum until a few hours after its timestamp) to respond to this.> can you guess? wrote: > > > > Primarily its checksumming features, since other > open source solutions support simple disk scrubbing > (which given its ability to catch most deteriorating > disk sectors before they become unreadable probably > has a greater effect on reliability than checksums in > any environment where the hardware hasn''t been > slapped together so sloppily that connections are > flaky). > > > From what I''ve read on the subject, That premise > seems bad from the > tart.Then you need to read more or understand it better. I don''t believe that scrubbing will catch all> the types of > errors that checksumming will.That''s absolutely correct, but it in no way contradicts what I said (and you quoted) above. Perhaps you should read that again, more carefully: it merely states that disk scrubbing probably has a *greater* effect on reliability than checksums do, not that it completely subsumes their features. There are a category> of errors that are > not caused by firmware, or any type of software. The > hardware just > doesn''t write or read the correct bit value this time > around. With out a > checksum there''s no way for the firmware to know, and > next time it very > well may write or read the correct bit value from the > exact same spot on > the disk, so scrubbing is not going to flag this > sector as ''bad''.It doesn''t have to, because that''s a *correctable* error that the disk''s extensive correction codes (which correct *all* single-bit errors as well as most considerably longer error bursts) resolve automatically.> > Now you may claim that this type of error happens so > infrequentlyNo, it''s actually one of the most common forms, due to the desire to pack data on the platter as tightly as possible: that''s why those long correction codes were created. Rather than comment on the rest of your confused presentation about disk error rates, I''ll just present a capsule review of the various kinds: 1. Correctable errors (which I just described above). If a disk notices that a sector *consistently* requires correction it may deal with it as described in the next paragraph. 2. Errors that can be corrected only with retries (i.e., the sector is not *consistently* readable even after the ECC codes have been applied, but can be successfully read after multiple attempts which can do things like fiddle slightly with the head position over the track and signal amplification to try to get a better response). A disk may try to rewrite such a sector in place to see if its readability improves as a result, and if it doesn''t will then transparently revector the data to a spare sector if one exists and mark the original sector as ''bad''. Background scrubbing gives the disk an opportunity to discover such sectors *before* they become completely unreadable, thus significantly improving reliability even in non-redundant environments. 3. Uncorrectable errors (bursts too long for the ECC codes to handle even after the kinds of retries described above, but which the ECC codes can still detect): scrubbing catches these as well, and if suitable redundancy exists it can correct them by rewriting the offending sector (the disk may transparently revector it if necessary, or the LVM or file system can if the disk can''t). Disk vendor specs nominally state that one such error may occur for every 10^14 bits transferred for a contemporary commodity (ATA or SATA) drive (i.e., about once in every 12.5 TB), but studies suggest that in practice they''re much rarer. 4. Undetectable errors (errors which the ECC codes don''t detect but which ZFS''s checksums presumably would). Disk vendors no longer provide specs for this reliability metric. My recollection from a decade or more ago is that back when they used to it was three orders of magnitude lower than the uncorrectable error rate: if that still obtained it would mean about once in every 12.5 petabytes transferred, but given that the real-world incidence of uncorrectable errors is so much lower than speced and that ECC codes keep increasing in length it might be far lower than that now. ...> > Aside from the problems that scrubbing handles (and > you need scrubbing even if you have checksums, > because scrubbing is what helps you *avoid* data loss > rather than just discover it after it''s too late to > do anything about it), and aside from problems > Again I think you''re wrong on the basis for your > point.No: you''re just confused again. The checksumming> in ZFS (if I understand it correctly) isn''t used for > only detecting the > problem. If the ZFS pool has any redundancy at all, > those same checksums > can be used to repair that same data, thus *avoiding* > the data loss.1. Unlike things like disk ECC codes, ZFS''s checksums can''t repair data: they just detect that it''s corrupt. 2. So does disk scrubbing, save for the *extremely* rare cases of undetectable errors (see above) or other rare errors that aren''t related to transferring bits to and from the disk platter (see Anton''s recent post, for example). 3. Unlike disk scrubbing, ZFS''s checksums per se only validate data when it happens to be read, and only one copy of it - so ZFS internally schedules background data scrubs that presumably read everything, including applicable redundancy (this can be more expensive than the streaming-sequential background scrubs that can be performed when you don''t have to validate file-structured checksum information, but the additional overhead shouldn''t be important given that it occurs in the background). 4. With both approaches, if redundancy is present then when the corrupt data is detected it can be corrected by rewriting it using the good copy generated from the redundancy. (more confusion snipped)> > Robert Milkowski cited some sobering evidence that > mid-range arrays may have non-negligible firmware > problems that ZFS could often catch, but a) those are > hardly ''consumer'' products (to address that > sub-thread, which I think is what applies in > Stefano''s case) and b) ZFS''s claimed attraction for > higher-end (corporate) use is its ability to > *eliminate* the need for such products (hence its > ability to catch their bugs would not apply - though > I can understand why people who needed to use them > anyway might like to have ZFS''s integrity checks > along for the ride, especially when using > less-than-fully-mature firmware). > > > > > Every drive has firmware too. If it can be used to > detect and repair > array firmware problems, then it can be used by > consumers to detect and > repair drive firmware problems too.As usual, the question is whether that *matters* in any practical sense. Commodity drive firmware is a) far less complex than array firmware and b) is typically exposed to only a few standard operations that are far more thoroughly exercised than array firmware (i.e., any significant bugs tend to get flushed out long before it hits the field). Formal root-cause error analyses that I''ve seen have not identified disk firmware bugs as a significant source of error in conventional installations. The CERN study did find an adverse interaction between the firmware in its commodity drives and the firmware in its 3Ware RAID controllers due to the unusual demands that the latter were placing on the former plus the latter''s inclination to ignore disk time-outs, but that''s hardly a ''commodity'' environment - and was the reason I specifically focused my comment on ZFS''s claimed ability to *avoid* the need to use such hardware aids that might be less thoroughly wrung out than commodity drives in commodity environments. ...> Sure it''s true that something else that could trash > your data without > checksumming can still trash your data with it. But > making sure that the > data gets unmangled if it can is still worth > something,And I''ve never suggested otherwise: the question (once again) is *how much* it''s worth, and the answer in most situations is "not all that much, because it doesn''t significantly reduce exposure due to the magnitude of the *other* error sources that remain present even if checksums are used". Is everyone here (Anton excepted) so mathematically-challenged that they can''t grasp the fact that while something may be ''good'' in an abstract sense, whether it''s actually *valuable* is a *quantitative* question? and the> improvements you point out are needed in other > components would be > pointless (according to your argument) if something > like ZFS didn''t also > exist.No, you''re still confused. I listed a bunch of things you''d have to do to protect your data in typical situations before residual risk became sufficiently low that further reducing it via ZFS-style checksumming would have noticeable benefit, but they''re all eminently useful without ZFS as well. Hmmm - perhaps that'' s once again too abstract for you to follow, so let''s try something more concrete. Say your current risk level on a 100 point scale is 20 with no special precautions taken at all. Back up your data with no other changes and it might go down to 15. Back up your data and verify the backup as well (but no other changes) and it might go down to 12. Back up and verify your data multiple times at multiple sites (no other changes) and it might go down to 5. Periodically verify that your backups are still readable and it might go down to 3. So without using ZFS you can reduce your risk from a level of 20 to a level of 3: sounds worthwhile to me. Now that you''ve done that, if you use ZFS-style checksumming perhaps you can reduce your level of risk from 3 to 2 - and for some installations that might well be worth doing. On the other hand, if you use ZFS *without* taking the other steps, you only reduce your risk level from 20 to 19: perhaps measurable, but probably not noticeable and almost certainly not sufficiently worthwhile *by itself* to change your platform. Whoops - I seem to have said something very similar just below in the material that you quoted, but perhaps reasoning by analogy was not sufficiently concrete either.> > > So depending upon ZFS''s checksums to protect your > data in most PC environments is sort of like leaving > on a vacation and locking and bolting the back door > of your house while leaving the front door wide open: > yes, a burglar is less likely to enter by the back > door, but thinking that the extra bolt there made > you much safer is likely foolish....> > What I''m saying is that if you *really* care about > your data, then you need to be willing to make the > effort to lock and bolt the front door as well as the > back door and install an alarm system: if you do > that, *then* ZFS''s additional protection mechanisms > may start to become significant (because you''re > eliminated the higher-probability risks and ZFS''s > extra protection then actually reduces the > *remaining* risk by a significant percentage). > > > > > Agreed. Depending on only one copy of your important > data is > shortsighted. But using a tool like ZFS on at least > the most active > copy, if not all copies will be an improvement, if it > even once stops > you from having to go to your other copies.And disk scrubbing is almost equally likely to accomplish this, because it catches all but a minute portion of the same kinds of problems that ZFS catches.> > Also it''s interesting that you use the term ''alarm > system''. That''s > exactly how I view the checksumming features of ZFS. > It is an alarm that > goes off if any of my bits have been lost to an > invisible ''burglar''.As is does disk scrubbing.> > I''ve also noticed how you happen to skip the data > replication features > of ZFS.I suspect that you''re not talking about RAID but about snapshots.> While they may not be everything you''ve hopedRAID-Z certainly isn''t and ZFS''s more general approach to internal redundancy could be more automated and flexible, but ZFS snapshots are fairly similar to most other file system implementations: the only potentially superior approach that I''m acquainted with is something like Interbase''s multi-versioning mechanism, which trades off access performance to historical data for a more compact representation and more flexibility in moving data around without creating additional snapshot overhead.> they would be, > they are features that will have value to people who > want to do exactly > what you suggest, keeping multiple copies of their > data in multiple places.You have me at a disadvantage here, because I''m not even a Unix (let alone Solaris and Linux) aficionado. But don''t Linux snapshots in conjunction with rsync (leaving aside other possibilities that I''ve never heard of) provide rather similar capabilities (e.g., incremental backup or re-synching), especially when used in conjunction with scripts and cron? ...> On the cost side of things, I think you also miss a > point. > > The data checking *and repair* features of ZFS bring > down the cost of > storage not just on the cost of the software. It also > allows (as in > safeguards) the use of significantly lower priced > Hardware (SATA drives > instead of SAS or FCAL, or expensive arrays) by > making up for the > slightly higher possibility of problems that hardware > brings with it.Nothing which you describe above is unique to ZFS: comparable zero-cost open-source solutions are available on Linux using its file systems, logical volume management, and disk scrubbing. ...> >> i''d love to see > >> the improvements on the many shortcomings you''re > >> pointing to and > >> passionate about written up, proposed, and freely > >> implemented :) > >> > > > > Then ask the ZFS developers to get on the stick: > fixing the fragmentation problem discussed elsewhere > should be easy, and RAID-Z is at least amenable to a > redesign (though not without changing the on-disk > metadata structures a bit - but while they''re at it, > they could include support for data redundancy in a > manner analogous to ditto blocks so that they could > get rid of the vestigial LVM-style management in > that area). > > > > I think he was suggesting that if it''s so important > to you, go ahead and > submit the changes yourself.Then he clearly hadn''t read my earlier posts in which I explained that I have no interest whatsoever in doing that: I just came here on the off-chance that some technically interesting insights might be found, and have mostly stuck around since (despite the conspicuous lack of such insights) because I got sufficiently disgusted with some of the attitudes here that I decided to confront them (it''s also kind of entertaining, though so far only in an intellectually-slumming sort of way that I won''t really miss after things have run their course). - bill This message posted from opensolaris.org
> You have me at a disadvantage here, because I''m not > even a Unix (let alone Solaris and Linux) aficionado. > But don''t Linux snapshots in conjunction with rsync > (leaving aside other possibilities that I''ve never > heard of) provide rather similar capabilities (e.g., > incremental backup or re-synching), especially when > used in conjunction with scripts and cron? >Which explains why you keep ranting without knowing what you''re talking about. Copy-on-write. Even a bookworm with 0 real-life-experience should be able to apply this one to a working situation. There''s a reason ZFS (and netapp) can take snapshots galore without destroying their filesystem performance. Hell this one even applies *IN THEORY*, so you might not even have to *slum* with any real-world usage to grasp the concept. This message posted from opensolaris.org
> There are a category of errors that are > not caused by firmware, or any type of software. The > hardware just doesn''t write or read the correct bit value this time > around. With out a checksum there''s no way for the firmware to know, and > next time it very well may write or read the correct bit value from the > exact same spot on the disk, so scrubbing is not going to flag this > sector as ''bad''.There seems to be a lot of ignorance about how disks actually work in this thread. Here''s the data path, to a first approximation. Processor <=> RAM <=> controller RAM <=> disk cache RAM <=> read/write head <=> media There are four buses in the above (which is a slight oversimplification): the processor/memory bus, the internal I/O bus (e.g. PCI), the external I/O bus (e.g. SATA), and the internal disk bus. (The last arrow isn''t a bus, it''s the magnetic field.) Errors can be introduced at any point and there are corresponding error detection and correction mechanisms at each point. Processor: Usually parity on internal registers & buses, ECC on larger cache. Processor/memory bus: Usually ECC (SECDED). RAM: Usually SECDED or better for better servers, parity for cheap servers, nothing @ low-end. Internal I/O bus: Usually parity (PCI) or CRC (PCI-E). Controller RAM: Usually parity for low-end controllers, rarely ECC for high-end controllers. External I/O bus: Usually CRC. Disk cache RAM: Usually parity for low-end disks, ECC for high-end disks. Internal disk bus: Media ECC. Read/write head: N/A, doesn''t hold bits. Media: Media ECC. The disk, as it''s transferring data from its cache to the media, adds a very large and complex error-correction coding to the data. This protects against a huge number of errors, 20 or more bits in a single 512-byte block. This is because the media is very noisy. So there is far *better* protection than a checksum for the data once it gets to the disk, and you can''t possibly (well, not within any reasonable probability) return bad data from disk. You''ll get an I/O error ("media error" in SCSI parlance) instead. ZFS protects against an error introduced between memory and the disk. "Aha!", you say, "there''s a lot of steps there, and we could get an error at any point!" There are a lot of points there, but very few where the data isn''t already protected by either CRC or parity. (Why do controllers usually use parity internally? The same reason the processor uses parity for L1; access is speed-critical, and the data is "live" in the cache/FIFO for such a small amount of time that the probability of a multi-bit error is negligible.)> Now you may claim that this type of error happens so infrequently that > it''s not worth it.I do claim that the error you described -- a bit error on the disk, undetected by the disk''s ECC -- is infrequent to the point of being negligible. The much more frequent case, an error which is detected but not corrected by ECC, is handled by simple mirroring. Anton This message posted from opensolaris.org
> > You have me at a disadvantage here, because I''m > not > > even a Unix (let alone Solaris and Linux) > aficionado. > > But don''t Linux snapshots in conjunction with > rsync > > (leaving aside other possibilities that I''ve never > > heard of) provide rather similar capabilities > (e.g., > > incremental backup or re-synching), especially > when > > used in conjunction with scripts and cron? > > > > > Which explains why you keep ranting without knowing > what you''re talking about.Au contraire, cookie: I present things in detail to make it possible for anyone capable of understanding the discussion to respond substantively if there''s something that requires clarification or further debate. You, by contrast, babble on without saying anything substantive at all - which makes you kind of amusing, but otherwise useless. You could at least have tried to answer my question above, since you took the trouble to quote it - but of course you didn''t, just babbled some more. Copy-on-write. Even a> bookworm with 0 real-life-experience should be able > to apply this one to a working situation.As I may well have been designing and implementing file systems since before you were born (or not: you just have a conspicuously callow air about you), my ''real-life'' experience with things like COW is rather extensive. And while I don''t have experience with Linux adjuncts like rsync, unlike some people I''m readily able to learn from the experience of others (who seem far more credible when describing their successful use of rsync and snapshots on Linux than anything I''ve seen you offer up here).> > There''s a reason ZFS (and netapp) can take snapshots > galore without destroying their filesystem > performance.Indeed: it''s because ZFS already sacrificed a significant portion of that performance by disregarding on-disk contiguity, so there''s relatively little left to lose. By contrast, systems that respect the effects of contiguity on performance (and WAFL does to a greater degree than ZFS) reap its benefits all the time (whether snapshots exist or not) while only paying a penalty when data is changed (and they don''t have to change as much data as ZFS does because they don''t have to propagate changes right back to the root superblock on every update). It is possible to have nearly all of the best of both worlds, but unfortunately not with any current implementations that I know of. ZFS could at least come considerably closer, though, if it reorganized opportunistically as discussed in the database thread. (By the way, since we''re talking about snapshots here rather than about clones it doesn''t matter at all how many there are, so your ''snapshots galore'' bluster above is just more evidence of your technical incompetence: with any reasonable implementation the only run-time overhead occurs in keeping the most recent snapshot up to date, regardless of how many older snapshots may also be present.) But let''s see if you can, for once, actually step up to the plate and discuss something technically, rather than spout buzzwords that you apparently don''t come even close to understanding: Are you claiming that writing snapshot before-images of modified data (as, e.g., Linux LVM snapshots do) for the relatively brief period that it takes to transfer incremental updates to another system ''destroys'' performance? First of all, that''s clearly dependent upon the update rate during that interval, so if it happens at a quiet time (which presumably would be arranged if its performance impact actually *was* a significant issue) your assertion is flat-out-wrong. Even if the snapshot must be processed during normal operation, maintaining it still won''t be any problem if the run-time workload is read-dominated. And I suppose Sun must be lying in its documentation for fssnap (which Sun has offered since Solaris 8 with good old update-in-place UFS) where it says "While the snapshot is active, users of the file system might notice a slight performance impact [as contrasted with your contention that performance is ''destroyed''] when the file system is written to, but they see no impact when the file system is read" (http://docsun.cites.uiuc.edu/sun_docs/C/solaris_9/SUNWaadm/SYSADV1/p185.html). You''d really better contact them right away and set them straight. Normal system cache mechanisms should typically keep about-to-be-modified data around long enough to avoid the need to read it back in from disk to create the before-image for modified data used in a snapshot, and using a log-structured approach to storing these BIs in the snapshot file or volume (though I don''t know what specific approaches are used in fssnap and LVM: do you?) would be extremely efficient - resulting in minimal impact on normal system operation regardless of write activity. C''mon, cookie: surprise us for once - say something intelligent. With guidance and practice, you might even be able to make a habit of it. - bill This message posted from opensolaris.org
> NOTHING anton listed takes the place of ZFSThat''s not surprising, since I didn''t list any file systems. Here''s a few file systems, and some of their distinguishing features. None of them do exactly what ZFS does. ZFS doesn''t do what they do, either. QFS: Very, very fast. Supports segregation of data from metadata, and classes of data. Supports SAN access to data. XFS: Also fast; works efficiently on multiprocessors (in part because allocation can proceed in parallel). Supports SAN access to data (CXFS). Delayed allocation allows temporary files to stay in memory and never even be written to disk (and improves contiguity of data on disk). JFS: Another very solid journaled file system. GPFS: Yet another SAN file system, with tighter semantics than QFS or XFS; highly reliable. StorNext: Hey, it''s another SAN file system! Guaranteed I/O rates (hmmm, which XFS has too, at least on Irix) -- a key for video use. SAMFS: Integrated archiving -- got petabytes of data that you need virtually online? SAM''s your man! (well, at least your file system) AdvFS: A journaled file system with snapshots, integrated volume management, online defragmentation, etc. VxFS: Everybody knows, right? Journaling, snapshots (including writable snapshots), highly tuned features for databases, block-level change tracking for more efficient backups, etc. There are many, many different needs. There''s a reason why there is no "one true file system." -- Anton> Better yet, you get back to writing that file system > that''s going to fix all these horrible deficiencies > in zfs.Ever heard of RMS? A file system which supports not only sequential access to files, or random access, but keyed access. (e.g. "update the record whose key is 123")? A file system which allowed any program to read any file, without needing to know about its internal format? (so such an indexed file could just be read as a sequence of ordered records by applications which processed ordinary text files.) A file system which could be shared between two, or even more, running operating systems, with direct access from each system to the disks. A file system with features like access control with alarms, MAC security on a per-file basis, multiple file versions, automatic deletion of temporary files, verify-after-write. You probably wouldn''t be interested; but others would. It solves a particular set of needs (primarily in the enterprise market). It did it very well. It did it some 30 years before ZFS. It''s very much worthwhile listening to those who built such a system, and their experiences, if your goal is to learn about file systems. Even if they don''t suffer fools gladly. === If you''ve got a problem for which ZFS is the best solution, great. Use it. But don''t think that it solves every problem, nor that it''s perfect for everyone -- even you. (One particular area to think about -- how do you back up your multi-terabyte pool? And how do you restore an individual file from your backups?) This message posted from opensolaris.org
from the description here http://www.djesys.com/vms/freevms/mentor/rms.html so who cares here ? RMS is not a filesystem, but more a CAS type of data repository On Dec 8, 2007 7:04 AM, Anton B. Rang <rang at acm.org> wrote:> > NOTHING anton listed takes the place of ZFS > > That''s not surprising, since I didn''t list any file systems. > > Here''s a few file systems, and some of their distinguishing features. None of them do exactly what ZFS does. ZFS doesn''t do what they do, either. > > QFS: Very, very fast. Supports segregation of data from metadata, and classes of data. Supports SAN access to data. > > XFS: Also fast; works efficiently on multiprocessors (in part because allocation can proceed in parallel). Supports SAN access to data (CXFS). Delayed allocation allows temporary files to stay in memory and never even be written to disk (and improves contiguity of data on disk). > > JFS: Another very solid journaled file system. > > GPFS: Yet another SAN file system, with tighter semantics than QFS or XFS; highly reliable. > > StorNext: Hey, it''s another SAN file system! Guaranteed I/O rates (hmmm, which XFS has too, at least on Irix) -- a key for video use. > > SAMFS: Integrated archiving -- got petabytes of data that you need virtually online? SAM''s your man! (well, at least your file system) > > AdvFS: A journaled file system with snapshots, integrated volume management, online defragmentation, etc. > > VxFS: Everybody knows, right? Journaling, snapshots (including writable snapshots), highly tuned features for databases, block-level change tracking for more efficient backups, etc. > > There are many, many different needs. There''s a reason why there is no "one true file system." > > -- Anton > > > Better yet, you get back to writing that file system > > that''s going to fix all these horrible deficiencies > > in zfs. > > Ever heard of RMS? > > A file system which supports not only sequential access to files, or random access, but keyed access. (e.g. "update the record whose key is 123")? > > A file system which allowed any program to read any file, without needing to know about its internal format? (so such an indexed file could just be read as a sequence of ordered records by applications which processed ordinary text files.) > > A file system which could be shared between two, or even more, running operating systems, with direct access from each system to the disks. > > A file system with features like access control with alarms, MAC security on a per-file basis, multiple file versions, automatic deletion of temporary files, verify-after-write. > > You probably wouldn''t be interested; but others would. It solves a particular set of needs (primarily in the enterprise market). It did it very well. It did it some 30 years before ZFS. It''s very much worthwhile listening to those who built such a system, and their experiences, if your goal is to learn about file systems. Even if they don''t suffer fools gladly. > > ===> > If you''ve got a problem for which ZFS is the best solution, great. Use it. But don''t think that it solves every problem, nor that it''s perfect for everyone -- even you. > > (One particular area to think about -- how do you back up your multi-terabyte pool? And how do you restore an individual file from your backups?) > > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
> from the description here > > http://www.djesys.com/vms/freevms/mentor/rms.html > so who cares here ? > > > RMS is not a filesystem, but more a CAS type of data > repositorySince David begins his description with the statement "RMS stands for "Record Management Services". It is the underlying "file system" of OpenVMS", I''ll suggest that your citation fails a priori to support your allegation above. Perhaps you''re confused by the fact that RMS/Files-11 is a great deal *more* of a file system than most Unix examples (though ReiserFS was at least heading in somewhat similar directions). You might also be confused by the fact that VMS separates its file system facilities into an underlying block storage and directory layer specific to disk storage and the upper RMS deblocking/interpretation/pan-device layer, whereas Unix combines the two. Better acquainting yourself with what CAS means in the context of contemporary disk storage solutions might be a good idea as well, since it bears no relation to RMS (nor to virtually any Unix file system). - bill This message posted from opensolaris.org
can you guess?
2007-Dec-08 13:32 UTC
[zfs-discuss] OT: NTFS Single Instance Storage (Re: Yager on ZFS
> Wade.Stuart at fallon.com wrote: > > Darren, > > > > Do you happen to have any links for this? I > have not seen anything > > about NTFS and CAS/dedupe besides some of the third > party apps/services > > that just use NTFS as their backing store. > > Single Instance Storage is what Microsoft uses to > refer to this: > > http://research.microsoft.com/sn/Farsite/WSS2000.pdfWhile SIS is likely useful in certain environments, it is actually layered on top of NTFS rather than part of it - and in fact could in principle be layered on top of just about any underlying file system in any OS that supported layered ''filter'' drivers. File access to a shared file via SIS runs through an additional phase of directory look-up similar to that involved in following a symbolic link, and its described copy-on-close semantics require divided data access within the updater''s version of the file (fetching unchanged data from the shared copy and changed data from the to-be-fleshed-out-after-close copy) with apparently no mechanism to avoid the need to copy the entire file after close even if only a single byte within it has been changed (which could compromise its applicability in some environments). Nonetheless, unlike most dedupe products it does apply to on-line rather than backup storage, and Microsoft deserves credit for fielding it well in advance of the dedupe startups: once in a while they actually do produce something that qualifies as at least moderately innovative. NTFS was at least respectable if not ground-breaking as well when it first appeared, and it''s too bad that it has largely stagnated since while MS pursued its ''structured storage'' and similar dreams (one might suspect in part to try to create a de facto storage standard that competitors couldn''t easily duplicate, limiting the portability of applications built to take advantage of its features without attracting undue attention from trust-busters, such as they are these days - but perhaps I''m just too cynical). - bill This message posted from opensolaris.org
can you run a database on RMS? I guess its not suited we are already trying to get ride of a 15 years old filesystem called wafl, and a 10 years old "file system" called Centera, so do you thing we are going to consider a 35 years old filesystem now... computer science made a lot of improvement since On Dec 8, 2007 1:38 PM, can you guess? <billtodd at metrocast.net> wrote:> > from the description here > > > > http://www.djesys.com/vms/freevms/mentor/rms.html > > so who cares here ? > > > > > > RMS is not a filesystem, but more a CAS type of data > > repository > > Since David begins his description with the statement "RMS stands for "Record Management Services". It is the underlying "file system" of OpenVMS", I''ll suggest that your citation fails a priori to support your allegation above. > > Perhaps you''re confused by the fact that RMS/Files-11 is a great deal *more* of a file system than most Unix examples (though ReiserFS was at least heading in somewhat similar directions). You might also be confused by the fact that VMS separates its file system facilities into an underlying block storage and directory layer specific to disk storage and the upper RMS deblocking/interpretation/pan-device layer, whereas Unix combines the two. > > Better acquainting yourself with what CAS means in the context of contemporary disk storage solutions might be a good idea as well, since it bears no relation to RMS (nor to virtually any Unix file system). > > - bill > > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
> can you run a database on RMS?As well as you could on must Unix file systems. And you''ve been able to do so for almost three decades now (whereas features like asynchronous and direct I/O are relative newcomers in the Unix environment).> I guess its not suitedAnd you guess wrong: that''s what happens when you speak from ignorance rather than from something more substantial.> we are already trying to get ride of a 15 years old > filesystem called > wafl,Whatever for? Please be specific about exactly what you expect will work better with whatever you''re planning to replace it with - and why you expect it to be anywhere nearly as solid. and a 10 years old "file system" called> Centera,My, you must have been one of the *very* early adopters, since EMC launched it only 5 1/2 years ago. so do you thing> we are going to consider a 35 years old filesystem > now... computer > science made a lot of improvement sinceWell yes, and no. For example, most Unix platforms are still struggling to match the features which VMS clusters had over two decades ago: when you start as far behind as Unix did, even continual advances may still not be enough to match such ''old'' technology. Not that anyone was suggesting that you replace your current environment with RMS: if it''s your data, knock yourself out using whatever you feel like using. On the other hand, if someone else is entrusting you with *their* data, they might be better off looking for someone with more experience and sense. - bill This message posted from opensolaris.org
can you guess? wrote:>> can you run a database on RMS? >> > > As well as you could on must Unix file systems. And you''ve been able to do so for almost three decades now (whereas features like asynchronous and direct I/O are relative newcomers in the Unix environment). >Funny, I remember trying to help customers move their applications from TOPS-20 to VMS, back in the early 1980s, and finding that the VMS I/O capabilities were really badly lacking. RMS was an abomination -- nothing but trouble, and another layer to keep you away from your data. Of course, TOPS-20 isn''t Unix; it''s one of the things the original Unix developers couldn''t afford, so they had to try to write something that would work for them and would run on hardware they *could* afford (the other one was Multics of course). -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
> can you guess? wrote: > >> can you run a database on RMS? > >> > > > > As well as you could on must Unix file systems. > And you''ve been able to do so for almost three > decades now (whereas features like asynchronous and > direct I/O are relative newcomers in the Unix > environment). > > nny, I remember trying to help customers move their > applications from > TOPS-20 to VMS, back in the early 1980s, and finding > that the VMS I/O > capabilities were really badly lacking.Funny how that works: when you''re not familiar with something, you often mistake your own ignorance for actual deficiencies. Of course, the TOPS-20 crowd was extremely unhappy at being forced to migrate at all, and this hardly improved their perception of the situation. If you''d like to provide specifics about exactly what was supposedly lacking, it would be possible to evaluate the accuracy of your recollection. RMS was an> abomination -- > nothing but trouble,Again, specifics would allow an assessment of that opinion. and another layer to keep you> away from your data.Real men use raw disks, of course. And with RMS (unlike Unix systems of that era) you could get very close to that point if you wanted to without abandoning the file level of abstraction - or work at a considerably more civilized level if you wanted that with minimal sacrifice in performance (again, unlike the Unix systems of that era, where storage performance was a joke until FFS began to improve things - slowly). VMS and RMS represented a very different philosophy than Unix: you could do anything, and therefore were exposed to the complexity that this flexibility entailed. Unix let you do things one simple way - whether it actually met your needs or not. Back then, efficient use of processing cycles (even in storage applications) could be important - and VMS and RMS gave you that option. Nowadays, trading off cycles to obtain simplicity is a lot more feasible, and the reasons for the complex interfaces of yesteryear can be difficult to remember. - bill This message posted from opensolaris.org
grand-dad, why don''t you put your immense experience and knowledge to contribute to what is going to be the next and only filesystems in modern operating systems, instead of spending your time asking for "specifics" and treating everyone of "ignorant"..at least we will remember you in the after life as being a major contributor to zfs success. Considering that you have never been considered by anyone until now (excpet your dog?)...who has ever heard of you ?? have you ever published anything worth reading? give us some of you mighty accomplishements. remember now it''s about opensourcing, reducing complexity and cost...keep the old propriatery things in DEC''s drawers and bring us real ideas s- On Dec 9, 2007 4:32 AM, can you guess? <billtodd at metrocast.net> wrote:> > can you run a database on RMS? > > As well as you could on must Unix file systems. And you''ve been able to do so for almost three decades now (whereas features like asynchronous and direct I/O are relative newcomers in the Unix environment). > > > I guess its not suited > > And you guess wrong: that''s what happens when you speak from ignorance rather than from something more substantial. > > > we are already trying to get ride of a 15 years old > > filesystem called > > wafl, > > Whatever for? Please be specific about exactly what you expect will work better with whatever you''re planning to replace it with - and why you expect it to be anywhere nearly as solid. > > and a 10 years old "file system" called > > Centera, > > My, you must have been one of the *very* early adopters, since EMC launched it only 5 1/2 years ago. > > so do you thing > > we are going to consider a 35 years old filesystem > > now... computer > > science made a lot of improvement since > > Well yes, and no. For example, most Unix platforms are still struggling to match the features which VMS clusters had over two decades ago: when you start as far behind as Unix did, even continual advances may still not be enough to match such ''old'' technology. > > Not that anyone was suggesting that you replace your current environment with RMS: if it''s your data, knock yourself out using whatever you feel like using. On the other hand, if someone else is entrusting you with *their* data, they might be better off looking for someone with more experience and sense. > > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
can you guess? wrote:>> can you guess? wrote: >> >>>> can you run a database on RMS? >>>> >>>> >>> As well as you could on must Unix file systems. >>> >> And you''ve been able to do so for almost three >> decades now (whereas features like asynchronous and >> direct I/O are relative newcomers in the Unix >> environment). >> >> nny, I remember trying to help customers move their >> applications from >> TOPS-20 to VMS, back in the early 1980s, and finding >> that the VMS I/O >> capabilities were really badly lacking. >> > > Funny how that works: when you''re not familiar with something, you often mistake your own ignorance for actual deficiencies. Of course, the TOPS-20 crowd was extremely unhappy at being forced to migrate at all, and this hardly improved their perception of the situation. > > If you''d like to provide specifics about exactly what was supposedly lacking, it would be possible to evaluate the accuracy of your recollection. >I''ve played this game before, and it''s off-topic and too much work to be worth it. Researching exactly when specific features were released into VMS RMS from this distance would be a total pain, and then we''d argue about which ones were beneficial for which situations, which people didin''t much agree about then or since. My experience at the time was that RMS was another layer of abstraction and performance loss between the application and the OS, and it made it harder to do things and it made them slower and it made files less interchangeable between applications; but I''m not interested in trying to defend this position for weeks based on 25-year-old memories. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
... I remember trying to help customers move> their > >> applications from > >> TOPS-20 to VMS, back in the early 1980s, and > finding > >> that the VMS I/O > >> capabilities were really badly lacking. > >> > > > > Funny how that works: when you''re not familiar > with something, you often mistake your own ignorance > for actual deficiencies. Of course, the TOPS-20 > crowd was extremely unhappy at being forced to > migrate at all, and this hardly improved their > perception of the situation. > > > > If you''d like to provide specifics about exactly > what was supposedly lacking, it would be possible to > evaluate the accuracy of your recollection. > > > > I''ve played this game before, and it''s off-topic and > too much work to be > worth it.In other words, you''ve got nothing, but you''d like people to believe it''s something. The phrase "Put up or shut up" comes to mind. Researching exactly when specific features> were released into > VMS RMS from this distance would be a total pain,I wasn''t asking for anything like that: I was simply asking for specific examples of the "VMS I/O capabilities" that you allegedly ''found'' "were really badly lacking" "in the early 1980s". Even if the porting efforts you were involved in predated the pivotal cancellation of Jupiter in 1983, that was still close enough to the VMS cluster release that most VMS development effort had turned in that direction (i.e., the single-system VMS I/O subsystem had pretty well reached maturity), so there won''t be any need to quibble about what shipped when. Surely if you had a sufficiently strong recollection to be willing to make such a definitive assertion you can remember *something* specific. and> then we''d argue > about which ones were beneficial for which > situations, which people > didin''t much agree about then or since.No, no, no: you''re reading far more generality into this than I ever suggested. I''m not asking you to judge what was useful, and I couldn''t care less whether you thought the features that VMS had and TOPS lacked were valuable: I''m just asking you to be specific about what "VMS I/O capabilities" you claim were seriously deficient. My> experience at the time was > that RMS was another layer of abstraction and > performance loss between > the application and the OS,Ah - your ''experience''. So you actually measured RMS''s effect on performance, rather than just SWAGged that adding a layer that you found unappealing in a product that your customers were angry about having to move to Must Be A Bad Idea? What was the quantitative result of that measurement, and how was RMS configured for the relevant workload? After all, the extra layer wasn''t introduced just to give you something to complain about: it was there to provide additional features and configuration flexibility (much of it performance-related), as described above. If you didn''t take advantage of those facilities, that could be a legitimate *complexity* knock against the environment but it''s not a legitimate *capability* or *performance* knock (rather the opposite, in fact). and it made it harder to> do thingsIf you were using the RMS API itself rather than accessing RMS through a higher-level language that provided simple I/O handling for simple I/O needs, that was undoubtedly the case: as I observed above, that''s a price that VMS was happy to pay for providing complete control to applications that wanted it. RMS was designed from the start to provide that alternative with the understanding that access via higher-level language mechanisms would usually be used by those people who didn''t need the low-level control that the native RMS API provided. and it> made them slowerThat''s the second time you''ve claimed that, so you''ll really at least have to describe *how* you measured this even if the detailed results of those measurements may be lost in the mists of time. and it made files less> interchangeable between > applications;That would have been some trick, given that RMS supported pure byte-stream files as well as its many more structured types (and I''m pretty sure that the C run-time system took this approach, using RMS direct I/O and doing its own deblocking to ensure that some of the more idiomatic C activities like single-character reads and writes would not inadvertently perform poorly). So at worst you could have used precisely the same in-file formats that were being used in the TOPS-20 environment and achieved the same degree of portability (unless you were actually encountering peculiarities in language access rather than in RMS itself: I''m considerably less familiar with that end of the environment). but I''m not interested in trying to> defend this position > for weeks based on 25-year-old memories.So far you don''t really have much of a position to defend at all: rather, you sound like a lot of the disgruntled TOPS users of that era. Not that they didn''t have good reasons to feel disgruntled - but they frequently weren''t very careful about aiming their ire accurately. Given that RMS really was *capable* of coming very close to the performance capabilities of the underlying hardware, your allegations just don''t ring true. Not being able to jump into VMS with little preparation and get the same performance you were used to in the TOPS-20 environment is quite believable, but not the suggestion that it was not *possible* to achieve at least equal performance had you been well-versed in the VMS environment. Decent *default* behavior is one of the things that VMS should have learned more from Unix about - especially as the years went by and decreasing memory prices made use of a central system file cache (which IIRC VMS didn''t acquire until the late ''80s) much more feasible (even in relatively low-end systems) than was the case in the ''70s when VMS was designed. RMS''s default application buffer size of 8 KB didn''t get increased for another decade or more after that: sure, you could increase that size explicitly, and use multiple larger buffers, and even stream data through them using transparent asynchronous multi-buffering, but the *default* became ridiculous. So it seems a lot more likely to me that you got bitten by default behavior tuned for small-system environments than by actual hard limitations - which was my initial observation to your original post. - bill This message posted from opensolaris.org
> why don''t you put your immense experience and > knowledge to contribute > to what is going to be > the next and only filesystems in modern operating > systems,Ah - the pungent aroma of teenage fanboy wafts across the Net. ZFS is not nearly good enough to become what you suggest above, nor is it amenable to some of the changes necessary to make it good enough. So while I''m happy to give people who have some personal reason to care about it pointers on how it could be improved, I have no interest in working on it myself. instead of> spending your time asking for "specifics"You''ll really need to learn to pay a lot more attention to specifics yourself if you have any desire to become technically competent when you grow up. and> treating everyone of > ignorant"I make some effort only to treat the ignorant as ignorant. It''s hardly my fault that they are so common around here, but I''d like to think that there''s a silent majority of more competent individuals in the forum who just look on quietly (and perhaps somewhat askance). It used to be that the ignorant felt motivated to improve themselves, but now they seem more inclined to engage in aggressive denial (which may be easier on the intellect but seems a less productive use of energy). - bill This message posted from opensolaris.org
Hello can, Monday, December 10, 2007, 3:35:27 AM, you wrote: cyg> and it>> made them slowercyg> That''s the second time you''ve claimed that, so you''ll really at cyg> least have to describe *how* you measured this even if the cyg> detailed results of those measurements may be lost in the mists of time. cyg> So far you don''t really have much of a position to defend at cyg> all: rather, you sound like a lot of the disgruntled TOPS users cyg> of that era. Not that they didn''t have good reasons to feel cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. cyg> Given that RMS really was *capable* of coming very close to the cyg> performance capabilities of the underlying hardware, your cyg> allegations just don''t ring true. Not being able to jump into And where is your "proof" that it "was capable of coming very close to the..."? Let me use your own words: "In other words, you''ve got nothing, but you''d like people to believe it''s something. The phrase "Put up or shut up" comes to mind." Where are your proofs on some of your claims about ZFS? Where are your detailed concepts how to solve some ZFS issues (imagined or not)? Demand nothing less from yourself than you demand from others. Bill, to be honest I don''t understand you - you wrote "I have no interest in working on it myself". So what is your interest here? The way you respond to people is offensive some times (don''t bother to say that they deserve it... it''s just your opinion) and your attitude from time to time is of a "guru" who knows everything but doesn''t actually deliver anything. So, except that you "fighting" ZFS everywhere you can, you don''t want to contribute to ZFS - what you want then? You seem like a guy with a quite good technical background (just an impression) who wants to contribute something but doesn''t know exactly what... Maybe you should try to focus that knowledge a little bit more and get something useful out of it instead of writing long "essays" which doesn''t contribute much (not that this reply isn''t long :)) I''m not being malicious here - I''m genuinely interested in what''s your agenda. I don''t blame other people accusing you of trolling. No offense intended. :) -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> Monday, December 10, 2007, 3:35:27 AM, you wrote: > > cyg> and it >>> made them slower > > cyg> That''s the second time you''ve claimed that, so you''ll really at > cyg> least have to describe *how* you measured this even if the > cyg> detailed results of those measurements may be lost in the mists of time. > > > cyg> So far you don''t really have much of a position to defend at > cyg> all: rather, you sound like a lot of the disgruntled TOPS users > cyg> of that era. Not that they didn''t have good reasons to feel > cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. > > cyg> Given that RMS really was *capable* of coming very close to the > cyg> performance capabilities of the underlying hardware, your > cyg> allegations just don''t ring true. Not being able to jump into > > And where is your "proof" that it "was capable of coming very close to > the..."?It''s simple: I *know* it, because I worked *with*, and *on*, it - for many years. So when some bozo who worked with people with a major known chip on their shoulder over two decades ago comes along and knocks its capabilities, asking for specifics (not even hard evidence, just specific allegations which could be evaluated and if appropriate confronted) is hardly unreasonable. Hell, *I* gave more specific reasons why someone might dislike RMS in particular and VMS in general (complex and therefore user-unfriendly low-level interfaces and sometimes poor *default* performance) than David did: they just didn''t happen to match those that he pulled out of (whereever) and that I challenged.> Let me use your own words: > > "In other words, you''ve got nothing, but you''d like people to believe it''s something. > > The phrase "Put up or shut up" comes to mind." > > Where are your proofs on some of your claims about ZFS?Well, aside from the fact that anyone with even half a clue knows what the effects of uncontrolled file fragmentation are on sequential access performance (and can even estimate those effects within moderately small error bounds if they know what the disk characteristics are and how bad the fragmentation is), if you''re looking for additional evidence that even someone otherwise totally ignorant could appreciate there''s the fact that Unix has for over two decades been constantly moving in the direction of less file fragmentation on disk - starting with the efforts that FFS made to at least increase proximity and begin to remedy the complete disregard for contiguity that the early Unix file system displayed and to which ZFS has apparently regressed, through the additional modifications that Kleiman and McVoy introduced in the early ''90s to group 56 KB of blocks adjacently when possible, through the extent-based architectures of VxFS, XFS, JFS, and soon-to-be ext4 file systems (I''m probably missing others here): given the relative changes between disk access times and bandwidth over the past decade and a half, ZFS with its max 128 KB blocks in splendid isolation offers significantly worse sequential performance relative to what''s attainable than the systems that used 56 KB aggregates back then did (and they weren''t all that great in that respect). Given how slow Unix was to understand and start to deal with this issue, perhaps it''s not surprising how ignorant some Unix people still are - despite the fact that other platforms fully understood the problem over three decades ago. Last I knew, ZFS was still claiming that it needed nothing like defragmentation, while describing write allocation mechanisms that could allow disastrous degrees of fragmentation under conditions that I''ve described quite clearly. If ZFS made no efforts whatsoever in this respect the potential for unacceptable performance would probably already have been obvious even to its blindest supporters, so I suspect that when ZFS is given the opportunity by a sequentially-writing application that doesn''t force every write (or by use of the ZIL in some cases) it aggregates blocks in a file together in cache and destages them in one contiguous chunk to disk (rather than just mixing blocks willy-nilly in its batch disk writes) - and a lot of the time there''s probably not enough other system write activity to make this infeasible, so that people haven''t found sequential streaming performance to be all that bad most of the time (especially on the read end if their systems are lightly loaded and the fact that their disks may be working a lot harder than they ought to have to is not a problem). But the potential remains for severe fragmention under heavily parallel access conditions, or when a file is updated at fine grain but then read sequentially (the whole basis of the recent database thread), and with that fragmentation comes commensurate performance degradation. And even if you''re not capable of understanding why yourself you should consider it significant that no one on the ZFS development team has piped up to say otherwise. Then there''s RAID-Z, which smears individual blocks across multiple disks in a manner that makes small-to-medium random access throughput suck. Again, this is simple logic and physics: if you understand the layout and the disk characteristics, you can predict the effects on a heavily parallel workload with fairly decent accuracy (I think that Roch mentioned this casually at one point, so it''s hardly controversial, and I remember reading a comment by Jeff Bonwick that he was pleased with the result of one benchmark - which made no effort to demonstrate the worst case - because the throughput penalty was ''only'' a factor of 2 rather than the full factor of N). And the way ZFS aparently dropped the ball on its alleged elimination of any kind of ''volume management'' by requiring that users create explicit (and matched) aggregations of disks to support mirroring and RAID-Z. I''m not the only one here who has observed that handling data redundancy more in the manner of ditto blocks would not only improve transparency in this area but be more consistent with the rest of ZFS''s approach to disk placement. Now, if someone came up with any kind of credible rebuttal to these assertions we could at least discuss it on technical grounds. But (and again you should consider this significant) no one has: all we have is well-reasoned analysis on the one hand and some (often fairly obnoxious) fanboy babble on the other. If you step back, make the effort required to *understand* that analysis, and try to look at the situation objectively, which do you find more credible? ZFS has other deficiencies, but they''re more fundamental choices involving poor trade-offs and lack of vision than outright (and easily rectifiable) flaws, so they could more justifiably be termed ''judgment calls'' and I haven''t delved as deeply into them. But they''re the main reason I have no interest in ''working on'' ZFS: while it would be a quantitative exaggeration to characterize doing so as ''putting lipstick on a pig'' (ZFS isn''t a real pig - just one more decent but limited file system among many, based on a couple of good but hardly original ideas - COW and transparently-managed space pools - applied sub-optimally in its implementation), that metaphor qualitatively captures the sense of my feelings toward it.> Where are your detailed concepts how to solve some ZFS issues > (imagined or not)?Right here in this thread and in the database thread. I described how to reorganize opportunistically to limit fragmentation performance penalties (in a way that would not require any change in the on-disk structure) and how to replace RAID-Z with a RAID-5-like implementation to fix the throughput problem (though that would require extensions to ZFS''s metadata pointer format). Perhaps it would behoove you to get up to speed before presuming to comment (though that doesn''t seem to be very traditional in this community, so you might wind up feeling out of place).> > Demand nothing less from yourself than you demand from others.I do. And I seldom disappoint myself in that respect.> > Bill, to be honest I don''t understand youOr most of what I''ve said, by all appearances. - you wrote "I have no> interest in working on it myself". So what is your interest here?You really haven''t bothered to read much at all, have you. I''ve said, multiple times, that I came here initially in the hope of learning something interesting. More recently, I came here because I offered a more balanced assessment of ZFS''s strengths and weaknesses in responding to the Yager article and wanted to be sure that I had not treated ZFS unfairly in some way - which started this extended interchange. After that, I explained that while the likelihood of learning anything technical here was looking pretty poor, I didn''t particularly like some of the personal attacks that I''d been subject to and had decided to confront them.> The way you respond to people is offensive some times (don''t bother to > say that they deserve it... it''s just your opinion)Yes, it is - and that fully explains my behavior in that regard. Whether you agree with my assessment (or the appropriateness of my response) is not of significant interest to me. and your attitude> from time to time is of a "guru" who knows everything but doesn''t > actually deliver anything.No, my attitude is that people too stupid and/or too lazy to understand what I *have* been delivering don''t deserve much respect if they complain.> > So, except that you "fighting" ZFS everywhere you can,It''s hardly a pervasive effort: I only take ZFS on when I happen to encounter fanboys who need straightening out. The first such was Robin Harris at his StoragaMojo and ZDNet blogs (which I occasionally visit), which led me to the controversy on the Hitz and Schwartz blogs and the Yager article, my response to which brought me back here. Had Jonathan not actively offended my sense of propriety by lying about what had initiated the patent skirmish and then cynically attempting to enlist the open source movement in what was a purely inter-corporate squabble I''d probably not have spent as much time and effort as I have. But that kind of high-level corporate sleaziness *really* disgusts me: I spent over 5 years playing truth-squad against Itanic to make sure that Compaq/HP paid a price for the way they screwed and lied to Alpha customers (and combined with the efforts of other like-minded people we actually succeeded pretty well, almost certainly costing them billions of dollars in profit given that Alpha systems provided a high-margin $7 billion annual revenue stream before the Alphacide and that most customers seemed inclined to follow along like sheep afterward until we shoved the pile so closely under their noses that they couldn''t help noticing the stench), so what I''ve been doing with ZFS is really small potatoes by comparison. you don''t want> to contribute to ZFSI just have no interest in *working* on it: if I were actively avoiding contributing at all, I wouldn''t have explained what some of its problems are and how to correct them.> - what you want then?My answers above should have made that clear by now (not that they include any information in that regard that I haven''t already presented before here, but now you''ve got the Readers Digest Condensed Version as well). - bill This message posted from opensolaris.org
Hello can, Tuesday, December 11, 2007, 6:57:43 PM, you wrote:>> Monday, December 10, 2007, 3:35:27 AM, you wrote: >> >> cyg> and it >>>> made them slower >> >> cyg> That''s the second time you''ve claimed that, so you''ll really at >> cyg> least have to describe *how* you measured this even if the >> cyg> detailed results of those measurements may be lost in the mists of time. >> >> >> cyg> So far you don''t really have much of a position to defend at >> cyg> all: rather, you sound like a lot of the disgruntled TOPS users >> cyg> of that era. Not that they didn''t have good reasons to feel >> cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. >> >> cyg> Given that RMS really was *capable* of coming very close to the >> cyg> performance capabilities of the underlying hardware, your >> cyg> allegations just don''t ring true. Not being able to jump into >> >> And where is your "proof" that it "was capable of coming very close to >> the..."?cyg> It''s simple: I *know* it, because I worked *with*, and *on*, it cyg> - for many years. So when some bozo who worked with people with cyg> a major known chip on their shoulder over two decades ago comes cyg> along and knocks its capabilities, asking for specifics (not even cyg> hard evidence, just specific allegations which could be evaluated cyg> and if appropriate confronted) is hardly unreasonable. Bill, you openly criticize people (their work) who have worked on ZFS for years... not that there''s anything wrong with that, just please realize that because you were working on it it doesn''t mean it is/was perfect - just the same as with ZFS. I know, everyone loves their baby... Nevertheless just because you were working on and with it, it''s not a proof. The person you were replaying to was also working with it (but not on it I guess). Not that I''m interested in such a proof. Just noticed that you''re demanding some proof, while you are also just write some statements on its performance without any actual proof.>> Let me use your own words: >> >> "In other words, you''ve got nothing, but you''d like people to believe it''s something. >> >> The phrase "Put up or shut up" comes to mind." >> >> Where are your proofs on some of your claims about ZFS?cyg> Well, aside from the fact that anyone with even half a clue cyg> knows what the effects of uncontrolled file fragmentation are on cyg> sequential access performance (and can even estimate those cyg> effects within moderately small error bounds if they know what cyg> the disk characteristics are and how bad the fragmentation is), cyg> if you''re looking for additional evidence that even someone cyg> otherwise totally ignorant could appreciate there''s the fact that I''ve never said there are not fragmentation problems with ZFS. Well, actually I''ve been hit by the issue in one environment. Also you haven''t done your work home properly, as one of ZFS developers actually stated they are going to work on ZFS de-fragmentation and disk removal (pool shrinking). See http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 Lukasz happens to be my friend who is also working with the same environment. The point is, and you as a long time developer (I guess) should know it, you can''t have everything done at once (lack of resources, and it takes some time anyway) so you must prioritize. ZFS is open source and if someone thinks that given feature is more important than the other he/she should try to fix it or at least voice it here so ZFS developers can possibly adjust their priorities if there''s good enough and justified demand. Now the important part - quite a lot of people are using ZFS, from desktop usage, their laptops, small to big production environments, clustered environments, SAN environemnts, JBODs, entry-level to high-end arrays, different applications, workloads, etc. And somehow you can''t find many complaints about ZFS fragmentation. It doesn''t mean the problem doesn''t exist (and I know it first hand) - it means that for whatever reason for most people using ZFS it''s not a big problem if problem at all. However they do have other issues and many of them were already addressed or are being addressed. I would say that ZFS developers at least try to listen to the community. Why am I asking for a proof - well, given constrains on resources, I would say we (not that I''m ZFS developer) should focus on actual problems people have with ZFS rather then theoretical problems (which in some environments/workloads will show up and sooner or later they will have to be addressed too). Then you find people like Pawel Jakub Davidek (guy who ported ZFS to FreeBSD) who started experimenting with RAID-5 like implementation with ZFS - he provided even some numbers showing it might be worth looking at. That''s what community is about. I don''t see any point complaining about ZFS all over again - have you actually run into the problem with ZFS yourself? I guess not. You just assuming (correctly for some usage cases). I guess your message has been well heard. Since you''re not interested in anything more that bashing or complaining all the time about the same theoretical "issues" rather than contributing somehow (even by providing some test results which could be repeated) I wouldn''t wait for any positive feedback if I were you - anyway, what kind of feedback are you waiting for? cyg> Last I knew, ZFS was still claiming that it needed nothing like cyg> defragmentation, while describing write allocation mechanisms cyg> that could allow disastrous degrees of fragmentation under cyg> conditions that I''ve described quite clearly. Well, I haven''t talked to ZFS (yet) so I don''t know what he claims :)) If you are talking about ZFS developers then you can actually find some evidence that they do see that problem and want to work on it. Again see for example: http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 Bill, at least look at the list archives first. And again, "under conditions that I''ve described quite clearly." - that''s exactly the problem. You''ve just described something while others do have actual and real problems which should be addressed first. cyg> If ZFS made no cyg> efforts whatsoever in this respect the potential for unacceptable cyg> performance would probably already have been obvious even to its cyg> blindest supporters, Well, is it really so hard to understand that a lot of people use ZFS because it actually solves their problems? No matter what case scenarios you will find to theoretically show some ZFS weaker points, at the end what matters is if it does solve customer problems. And for many users it does, definitely not for all of them. I would argue that no matter what file system you will test or even design, one can always find a corner cases when it will behave less than optimal. For a general purpose file system what matters is that in most common cases it''s good enough. cyg> willy-nilly in its batch disk writes) - and a lot of the time cyg> there''s probably not enough other system write activity to make cyg> this infeasible, so that people haven''t found sequential cyg> streaming performance to be all that bad most of the time cyg> (especially on the read end if their systems are lightly load cyg> ed and the fact that their disks may be working a lot harder cyg> than they ought to have to is not a problem). Now you closer to the point. If the problem you are describing does not hit most people, we should put more effort solving problems which people are actually experiencing. cyg> Then there''s RAID-Z, which smears individual blocks across cyg> multiple disks in a manner that makes small-to-medium random cyg> access throughput suck. Again, this is simple logic and physics: cyg> if you understand the layout and the disk characteristics, you cyg> can predict the effects on a heavily parallel workload with cyg> fairly decent accuracy (I think that Roch mentioned this casually cyg> at one point, so it''s hardly controversial, and I remember cyg> reading a comment by Jeff Bonwick that he was pleased with the cyg> result of one benchmark - which made no effort to demonstrate the cyg> worst case - because the throughput penalty was ''only'' a factor cyg> of 2 rather than the full factor of N). Yeah, nothing really new here. If you need a guy from Sun, then read Roch''s post on RAID-Z performance. Nothing you''ve discovered here. Nevertheless RAID-Z[2] is good enough for many people. I know that simple logic and physics states that relativity equations provide better accuracy than Newton''s - nevertheless in most scenarios I''m dealing with it doesn''t really matter from a practical point of view. Then, in some environments RAID-Z2 (on JBOD) actually provides better performance than RAID-5 (and HW R5 for that matter). And, opposite to you, I''m not speculating but I''ve been working with such environment (lot of concurrent writes which are more critical than much less reads later). So when you saying that RAID-Z is brain-damaging - well, it''s mostly positive experience of a lot of people with RAID-Z vs. your statement without any real-world backing. Then, of course, one can produce a test or even real work environment where there will be lot of small and simultaneous reads, without any writes, from a dataset much bigger than available memory and RAID-Z would be much slower than RAID-5. Not that it''s novel - it was well discussed since RAID-Z was introduced. That''s one of the reasons why people use ZFS on-top of HW RAID-5 luns from time to time. cyg> And the way ZFS aparently dropped the ball on its alleged cyg> elimination of any kind of ''volume management'' by requiring that cyg> users create explicit (and matched) aggregations of disks to cyg> support mirroring and RAID-Z. # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4 # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 /var/tmp/f4 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT bill 373M 90K 373M 0% ONLINE - # # mkfile 128m f11 ; mkfile 256m f44 # zpool destroy bill # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz /var/tmp/f3 /var/tmp/f4 /var/tmp/f44 # zfs list NAME USED AVAIL REFER MOUNTPOINT bill 101K 715M 32.6K /bill # (2*128+2*256=768) - looks fine. If you are talking about a solution which enables user to mix different disk sizes in the same mirror or RAID-5 group and while all the time providing given protection allows you to utilize 100% of all disk capacities.... well, what is that solution? Is it free? Open source? Available on general purpose OS? Or commodity HW? Available at all? :P cyg> Now, if someone came up with any kind of credible rebuttal to cyg> these assertions we could at least discuss it on technical cyg> grounds. But (and again you should consider this significant) no cyg> one has: all we have is well-reasoned analysis on the one hand cyg> and some (often fairly obnoxious) fanboy babble on the other. If cyg> you step back, make the effort required to *understand* that cyg> analysis, and try to look at the situation objectively, which do you find more credible? Most credible to me is actual user experience than some theoretical burbling. While I appreciate it, and to some extend it''s valid, for me again most important is actual experience. Going endlessly all over again, why ZFS is bad because you think fragmentation is a big issue, while most of the actual users don''t agree, is pointless imho. Instead, try to do something positive and practical - for all the time you spend on bashing ZFS, you probably would have already come up with some basic proof-of-concept of flawless RAID-5 in ZFS or fragmentation-free improvement, and once you''ve proved it actually is promising everyone would love you and help you polishing the code. :))))))) cyg> ZFS has other deficiencies, but they''re more fundamental choices cyg> involving poor trade-offs and lack of vision than outright (and cyg> easily rectifiable) flaws, so they could more justifiably be cyg> termed ''judgment calls'' and I haven''t delved as deeply into them. And what they are? What are the alternatives in a market? Whie ZFS is not perfect, and for some people lack of user quotes is no-go with ZFS, for many other it just doesn''t make sense to go with NetApp if only for economical reasons. Whatever theoretical deficiencies you have in mind, I myself, and many others, when confronted with ZFS in real world environments, I find it most of the time much more flexible in managing storage than LVM, XFS, UFS, VxVM/VxVF, NetApp. Also more secure, etc. And quite often similar with similar performance or even better. Then thanks to zfs send|recv I get really interesting backup option. What some people are also looking for, I guess, is a black-box approach - easy to use GUI on top of Solaris/ZFS/iSCSI/etc. So they don''t have to even know it''s ZFS or Solaris. Well... cyg> But they''re the main reason I have no interest in ''working on'' Well, you''re not using ZFS, you are not interested in working on it, all you are interested is finding some potential corner cases bad for ZFS and bashing it. If you put at least 10% of your energy you''re putting in your ''holy war'' you would at least provide some benchmarks (filebench?) showing these corner cases in comparison to other mind-blowing solutions on the market which are much better than ZFS, so we can all reproduce them and try to address ZFS problems. :)))) [...]>>And I seldom disappoint myself in that respect.Honestly, I believe you - no doubt about it. cyg> You really haven''t bothered to read much at all, have you. I''ve cyg> said, multiple times, that I came here initially in the hope of cyg> learning something interesting. More recently, I came here cyg> because I offered a more balanced assessment of ZFS''s strengths cyg> and weaknesses in responding to the Yager article and wanted to cyg> be sure that I had not treated ZFS unfairly in some way - which cyg> started this extended interchange. After that, I explained that cyg> while the likelihood of learning anything technical here was cyg> looking pretty poor, I didn''t particularly like some of the cyg> personal attacks that I''d been subject to and had decided to confront them. Well, every time I saw it was you ''attacking'' other people first. And it''s just your opinion that you offered more balanced assessment which is not shared by many. If you are not contributing here, and you are not learning here - wy are you here? I''m serious - why? Wouldn''t it better serve you to actually contribute to the other project, where developers actually get it - where no one is personally attacking you, where there are no fundamental bad choices made while in design, where RAID-5 is flawless, fragmentation problem doesn''t exist neither all the other corner cases. Performance is best in a market all the time, and I can run in on commodity HW or so called big iron, on a well known general purpose OS. Well, I assume that project is open source too - maybe you share with all of us that secret so we can join it too and forget about ZFS? I''m first to "convert" and forget about ZFS. Of course as that secret project is so perfect it probably doesn''t make sense to contribute as developer as there is nothing left to contribute - but, hey - I''m not a developer - I''m user and I''m definitely interested in that project. cyg> No, my attitude is that people too stupid and/or too lazy to cyg> understand what I *have* been delivering don''t deserve much respect if they complain. Maybe you should thing about that "stupid" part... Maybe, just maybe, it''s possible that all people around you don''t understand you, that world is wrong and we''re all so stupid. Well, maybe. Even if it is so, then perhaps it''s time to stop being Don Quixote and move on? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On 11-Dec-07, at 9:44 PM, Robert Milkowski wrote:> Hello can, > ... > > What some people are also looking for, I guess, is a black-box > approach - easy to use GUI on top of Solaris/ZFS/iSCSI/etc. So they > don''t have to even know it''s ZFS or Solaris. Well...Pretty soon OS X will be exactly that - a native booting zero-admin ZFS-based system - as used by your grandmother on her iMac, your kid son on his iBook, etc ...> Wouldn''t it better serve you to actually contribute to the other > project, where developers actually get it - where no one is personally > attacking you, where there are no fundamental bad choices made while > in design, where RAID-5 is flawless, fragmentation problem doesn''t > exist neither all the other corner cases.And don''t forget - the perfect system doesn''t waste time checksumming! It''s unnecessary!> Performance is best in a > market all the time, and I can run in on commodity HW or so called big > iron, on a well known general purpose OS. Well, I assume that project > is open source too - maybe you share with all of us that secret so > we can > join it too and forget about ZFS? ... perhaps it''s time to stop > being Don Quixote > and move on?At least Sr Quixote was funny and never rude without provocation. --Toby> > > > > > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, 11 Dec 2007, Robert Milkowski wrote:> Hello can, > > Tuesday, December 11, 2007, 6:57:43 PM, you wrote: > >>> Monday, December 10, 2007, 3:35:27 AM, you wrote: >>> >>> cyg> and it >>>>> made them slower >>> >>> cyg> That''s the second time you''ve claimed that, so you''ll really at >>> cyg> least have to describe *how* you measured this even if the >>> cyg> detailed results of those measurements may be lost in the mists of time. >>> >>> >>> cyg> So far you don''t really have much of a position to defend at >>> cyg> all: rather, you sound like a lot of the disgruntled TOPS users >>> cyg> of that era. Not that they didn''t have good reasons to feel >>> cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. >>> >>> cyg> Given that RMS really was *capable* of coming very close to the >>> cyg> performance capabilities of the underlying hardware, your >>> cyg> allegations just don''t ring true. Not being able to jump into >>> >>> And where is your "proof" that it "was capable of coming very close to >>> the..."? > > cyg> It''s simple: I *know* it, because I worked *with*, and *on*, it > cyg> - for many years. So when some bozo who worked with people with > cyg> a major known chip on their shoulder over two decades ago comes > cyg> along and knocks its capabilities, asking for specifics (not even > cyg> hard evidence, just specific allegations which could be evaluated > cyg> and if appropriate confronted) is hardly unreasonable. > > Bill, you openly criticize people (their work) who have worked on ZFS > for years... not that there''s anything wrong with that, just please > realize that because you were working on it it doesn''t mean it is/was > perfect - just the same as with ZFS. > I know, everyone loves their baby... > > Nevertheless just because you were working on and with it, it''s not a > proof. The person you were replaying to was also working with it (but > not on it I guess). Not that I''m interested in such a proof. Just > noticed that you''re demanding some proof, while you are also just > write some statements on its performance without any actual proof. > > > >>> Let me use your own words: >>> >>> "In other words, you''ve got nothing, but you''d like people to believe it''s something. >>> >>> The phrase "Put up or shut up" comes to mind." >>> >>> Where are your proofs on some of your claims about ZFS? > > cyg> Well, aside from the fact that anyone with even half a clue > cyg> knows what the effects of uncontrolled file fragmentation are on > cyg> sequential access performance (and can even estimate those > cyg> effects within moderately small error bounds if they know what > cyg> the disk characteristics are and how bad the fragmentation is), > cyg> if you''re looking for additional evidence that even someone > cyg> otherwise totally ignorant could appreciate there''s the fact that > > I''ve never said there are not fragmentation problems with ZFS. > Well, actually I''ve been hit by the issue in one environment. > Also you haven''t done your work home properly, as one of ZFS > developers actually stated they are going to work on ZFS > de-fragmentation and disk removal (pool shrinking). > See http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 > Lukasz happens to be my friend who is also working with the same > environment. > > The point is, and you as a long time developer (I guess) should know it, > you can''t have everything done at once (lack of resources, and it takes > some time anyway) so you must prioritize. ZFS is open source and if > someone thinks that given feature is more important than the other > he/she should try to fix it or at least voice it here so ZFS > developers can possibly adjust their priorities if there''s good enough > and justified demand. > > Now the important part - quite a lot of people are using ZFS, from > desktop usage, their laptops, small to big production environments, > clustered environments, SAN environemnts, JBODs, entry-level to high-end arrays, > different applications, workloads, etc. And somehow you can''t find > many complaints about ZFS fragmentation. It doesn''t mean the problem > doesn''t exist (and I know it first hand) - it means that for whatever > reason for most people using ZFS it''s not a big problem if problem at > all. However they do have other issues and many of them were already > addressed or are being addressed. I would say that ZFS developers at > least try to listen to the community. > > Why am I asking for a proof - well, given constrains on resources, I > would say we (not that I''m ZFS developer) should focus on actual > problems people have with ZFS rather then theoretical problems (which > in some environments/workloads will show up and sooner or later they > will have to be addressed too). > > Then you find people like Pawel Jakub Davidek (guy who ported ZFS to > FreeBSD) who started experimenting with RAID-5 like implementation > with ZFS - he provided even some numbers showing it might be worth > looking at. That''s what community is about. > > I don''t see any point complaining about ZFS all over again - have you > actually run into the problem with ZFS yourself? I guess not. You just > assuming (correctly for some usage cases). I guess your message has > been well heard. Since you''re not interested in anything more that > bashing or complaining all the time about the same theoretical "issues" rather > than contributing somehow (even by providing some test results which > could be repeated) I wouldn''t wait for any positive feedback if I were > you - anyway, what kind of feedback are you waiting for? > > > cyg> Last I knew, ZFS was still claiming that it needed nothing like > cyg> defragmentation, while describing write allocation mechanisms > cyg> that could allow disastrous degrees of fragmentation under > cyg> conditions that I''ve described quite clearly. > > Well, I haven''t talked to ZFS (yet) so I don''t know what he claims :)) > If you are talking about ZFS developers then you can actually find > some evidence that they do see that problem and want to work on it. > Again see for example: http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 > Bill, at least look at the list archives first. > > And again, "under conditions that I''ve described quite clearly." - > that''s exactly the problem. You''ve just described something while > others do have actual and real problems which should be addressed > first. > > > cyg> If ZFS made no > cyg> efforts whatsoever in this respect the potential for unacceptable > cyg> performance would probably already have been obvious even to its > cyg> blindest supporters, > > Well, is it really so hard to understand that a lot of people use ZFS > because it actually solves their problems? No matter what case > scenarios you will find to theoretically show some ZFS weaker points, > at the end what matters is if it does solve customer problems. And for > many users it does, definitely not for all of them. > I would argue that no matter what file system you will test or even > design, one can always find a corner cases when it will behave less > than optimal. For a general purpose file system what matters is that > in most common cases it''s good enough. > > > cyg> willy-nilly in its batch disk writes) - and a lot of the time > cyg> there''s probably not enough other system write activity to make > cyg> this infeasible, so that people haven''t found sequential > cyg> streaming performance to be all that bad most of the time > cyg> (especially on the read end if their systems are lightly load > cyg> ed and the fact that their disks may be working a lot harder > cyg> than they ought to have to is not a problem). > > Now you closer to the point. If the problem you are describing does > not hit most people, we should put more effort solving problems which > people are actually experiencing. > > > cyg> Then there''s RAID-Z, which smears individual blocks across > cyg> multiple disks in a manner that makes small-to-medium random > cyg> access throughput suck. Again, this is simple logic and physics: > cyg> if you understand the layout and the disk characteristics, you > cyg> can predict the effects on a heavily parallel workload with > cyg> fairly decent accuracy (I think that Roch mentioned this casually > cyg> at one point, so it''s hardly controversial, and I remember > cyg> reading a comment by Jeff Bonwick that he was pleased with the > cyg> result of one benchmark - which made no effort to demonstrate the > cyg> worst case - because the throughput penalty was ''only'' a factor > cyg> of 2 rather than the full factor of N). > > Yeah, nothing really new here. If you need a guy from Sun, then read > Roch''s post on RAID-Z performance. Nothing you''ve discovered here. > Nevertheless RAID-Z[2] is good enough for many people. > I know that simple logic and physics states that relativity equations > provide better accuracy than Newton''s - nevertheless in most scenarios > I''m dealing with it doesn''t really matter from a practical point of > view. > > Then, in some environments RAID-Z2 (on JBOD) actually provides better > performance than RAID-5 (and HW R5 for that matter). And, opposite > to you, I''m not speculating but I''ve been working with such > environment (lot of concurrent writes which are more critical than > much less reads later). > So when you saying that RAID-Z is brain-damaging - well, it''s > mostly positive experience of a lot of people with RAID-Z vs. your statement without any > real-world backing. > > Then, of course, one can produce a test or even real work environment > where there will be lot of small and simultaneous reads, without any > writes, from a dataset much bigger than available memory and RAID-Z would be much slower than > RAID-5. Not that it''s novel - it was well discussed since RAID-Z was > introduced. That''s one of the reasons why people use ZFS on-top of > HW RAID-5 luns from time to time. > > > cyg> And the way ZFS aparently dropped the ball on its alleged > cyg> elimination of any kind of ''volume management'' by requiring that > cyg> users create explicit (and matched) aggregations of disks to > cyg> support mirroring and RAID-Z. > > # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4 > # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 /var/tmp/f4 > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > bill 373M 90K 373M 0% ONLINE - > # > # mkfile 128m f11 ; mkfile 256m f44 > # zpool destroy bill > # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz /var/tmp/f3 /var/tmp/f4 /var/tmp/f44 > # zfs list > NAME USED AVAIL REFER MOUNTPOINT > bill 101K 715M 32.6K /bill > # > (2*128+2*256=768) - looks fine. > > If you are talking about a solution which enables user to mix > different disk sizes in the same mirror or RAID-5 group and while all > the time providing given protection allows you to utilize 100% of all > disk capacities.... well, what is that solution? Is it free? > Open source? Available on general purpose OS? Or commodity HW? > Available at all? :P > > > cyg> Now, if someone came up with any kind of credible rebuttal to > cyg> these assertions we could at least discuss it on technical > cyg> grounds. But (and again you should consider this significant) no > cyg> one has: all we have is well-reasoned analysis on the one hand > cyg> and some (often fairly obnoxious) fanboy babble on the other. If > cyg> you step back, make the effort required to *understand* that > cyg> analysis, and try to look at the situation objectively, which do you find more credible? > > Most credible to me is actual user experience than some theoretical > burbling. While I appreciate it, and to some extend it''s valid, for me > again most important is actual experience. Going endlessly all over > again, why ZFS is bad because you think fragmentation is a big issue, > while most of the actual users don''t agree, is pointless imho. > > Instead, try to do something positive and practical - for all the time > you spend on bashing ZFS, you probably would have already come up with > some basic proof-of-concept of flawless RAID-5 in ZFS or > fragmentation-free improvement, and once you''ve proved it actually is > promising everyone would love you and help you polishing the code. > > :))))))) > > > cyg> ZFS has other deficiencies, but they''re more fundamental choices > cyg> involving poor trade-offs and lack of vision than outright (and > cyg> easily rectifiable) flaws, so they could more justifiably be > cyg> termed ''judgment calls'' and I haven''t delved as deeply into them. > > And what they are? What are the alternatives in a market? > Whie ZFS is not perfect, and for some people lack of user quotes is > no-go with ZFS, for many other it just doesn''t make sense to go with > NetApp if only for economical reasons. > > Whatever theoretical deficiencies you have in mind, I myself, and many > others, when confronted with ZFS in real world environments, I find it > most of the time much more flexible in managing storage than LVM, XFS, > UFS, VxVM/VxVF, NetApp. Also more secure, etc. And quite often similar > with similar performance or even better. Then thanks to zfs send|recv > I get really interesting backup option. > > What some people are also looking for, I guess, is a black-box > approach - easy to use GUI on top of Solaris/ZFS/iSCSI/etc. So they > don''t have to even know it''s ZFS or Solaris. Well... > > > cyg> But they''re the main reason I have no interest in ''working on'' > > Well, you''re not using ZFS, you are not interested in working on it, > all you are interested is finding some potential corner cases bad for > ZFS and bashing it. If you put at least 10% of your energy you''re > putting in your ''holy war'' you would at least provide some benchmarks > (filebench?) showing these corner cases in comparison to other > mind-blowing solutions on the market which are much better than ZFS, > so we can all reproduce them and try to address ZFS problems. > > :)))) > > > [...] >>> And I seldom disappoint myself in that respect. > > Honestly, I believe you - no doubt about it. > > > cyg> You really haven''t bothered to read much at all, have you. I''ve > cyg> said, multiple times, that I came here initially in the hope of > cyg> learning something interesting. More recently, I came here > cyg> because I offered a more balanced assessment of ZFS''s strengths > cyg> and weaknesses in responding to the Yager article and wanted to > cyg> be sure that I had not treated ZFS unfairly in some way - which > cyg> started this extended interchange. After that, I explained that > cyg> while the likelihood of learning anything technical here was > cyg> looking pretty poor, I didn''t particularly like some of the > cyg> personal attacks that I''d been subject to and had decided to confront them. > > Well, every time I saw it was you ''attacking'' other people first. > And it''s just your opinion that you offered more balanced assessment > which is not shared by many. > > If you are not contributing here, and you are not learning here - wy > are you here? I''m serious - why? > Wouldn''t it better serve you to actually contribute to the other > project, where developers actually get it - where no one is personally > attacking you, where there are no fundamental bad choices made while > in design, where RAID-5 is flawless, fragmentation problem doesn''t > exist neither all the other corner cases. Performance is best in a > market all the time, and I can run in on commodity HW or so called big > iron, on a well known general purpose OS. Well, I assume that project > is open source too - maybe you share with all of us that secret so we can > join it too and forget about ZFS? I''m first to "convert" and forget > about ZFS. Of course as that secret project is so perfect it probably > doesn''t make sense to contribute as developer as there is nothing left > to contribute - but, hey - I''m not a developer - I''m user and I''m > definitely interested in that project. > > cyg> No, my attitude is that people too stupid and/or too lazy to > cyg> understand what I *have* been delivering don''t deserve much respect if they complain. > > Maybe you should thing about that "stupid" part... > Maybe, just maybe, it''s possible that all people around you don''t > understand you, that world is wrong and we''re all so stupid. Well, > maybe. Even if it is so, then perhaps it''s time to stop being Don Quixote > and move on? >+1 Well said Robert! Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
> Hello can, > > Tuesday, December 11, 2007, 6:57:43 PM, you wrote: > >>> Monday, December 10, 2007, 3:35:27 AM, you wrote: >>> >>> cyg> and it >>>>> made them slower >>> cyg> That''s the second time you''ve claimed that, so you''ll really at >>> cyg> least have to describe *how* you measured this even if the >>> cyg> detailed results of those measurements may be lost in the mists of time. >>> >>> >>> cyg> So far you don''t really have much of a position to defend at >>> cyg> all: rather, you sound like a lot of the disgruntled TOPS users >>> cyg> of that era. Not that they didn''t have good reasons to feel >>> cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. >>> >>> cyg> Given that RMS really was *capable* of coming very close to the >>> cyg> performance capabilities of the underlying hardware, your >>> cyg> allegations just don''t ring true. Not being able to jump into >>> >>> And where is your "proof" that it "was capable of coming very close to >>> the..."? > > cyg> It''s simple: I *know* it, because I worked *with*, and *on*, it > cyg> - for many years. So when some bozo who worked with people with > cyg> a major known chip on their shoulder over two decades ago comes > cyg> along and knocks its capabilities, asking for specifics (not even > cyg> hard evidence, just specific allegations which could be evaluated > cyg> and if appropriate confronted) is hardly unreasonable. > > Bill, you openly criticize people (their work) who have worked on ZFS > for years... not that there''s anything wrong with that, just please > realize that because you were working on it it doesn''t mean it is/was > perfect - just the same as with ZFS.Of course it doesn''t - and I never claimed that RMS was anything close to ''perfect'' (I even gave specific examples of areas in which it was *far* from perfect). Just as I''ve given specific examples of where ZFS is far from perfect. What I challenged was David''s assertion that RMS was severely deficient in its *capabilities* - and demanded not ''proof'' of any kind but only specific examples (comparable in specificity to the examples of ZFS''s deficiencies that *I* have provided) that could actually be discussed.> I know, everyone loves their baby...No, you don''t know: you just assume that everyone is as biased as you and others here seem to be.> > Nevertheless just because you were working on and with it, it''s not a > proof. The person you were replaying to was also working with it (but > not on it I guess). Not that I''m interested in such a proof. Just > noticed that you''re demanding some proof, while you are also just > write some statements on its performance without any actual proof.You really ought to spend a lot more time understanding what you''ve read before responding to it, Robert. I *never* asked for anything like ''proof'': I asked for *examples* specific enough to address - and repeated that explicitly in responding to your previous demand for ''proof''. Perhaps I should at that time have observed that your demand for ''proof'' (your use of quotes suggesting that it was something that *I* had demanded) was ridiculous, but I thought my response made that obvious.> > > >>> Let me use your own words: >>> >>> "In other words, you''ve got nothing, but you''d like people to believe it''s something. >>> >>> The phrase "Put up or shut up" comes to mind." >>> >>> Where are your proofs on some of your claims about ZFS? > > cyg> Well, aside from the fact that anyone with even half a clue > cyg> knows what the effects of uncontrolled file fragmentation are on > cyg> sequential access performance (and can even estimate those > cyg> effects within moderately small error bounds if they know what > cyg> the disk characteristics are and how bad the fragmentation is), > cyg> if you''re looking for additional evidence that even someone > cyg> otherwise totally ignorant could appreciate there''s the fact that > > I''ve never said there are not fragmentation problems with ZFS.Not having made a study of your collected ZFS contributions here I didn''t know that. But some of ZFS''s developers are on record stating that they believe there is no need to defragment (unless they''ve changed their views since and not bothered to make us aware of it), and in the entire discussion in the recent ''ZFS + DB + "fragments"'' thread there were only three contributors (Roch, Anton, and I) who seemed willing to admit that any problem existed. So since one of my ''claims'' for which you requested substantiation involved fragmentation problems, it seemed appropriate to address them.> Well, actually I''ve been hit by the issue in one environment.But didn''t feel any impulse to mention that during all the preceding discussion, I guess.> Also you haven''t done your work home properly, as one of ZFS > developers actually stated they are going to work on ZFS > de-fragmentation and disk removal (pool shrinking). > See http://www.opensolaris.org/jive/thread.jspa?messageID=139680??Hmmm - there were at least two Sun ZFS personnel participating in the database thread, and they never mentioned this. I guess they didn''t do their ''work home'' properly either (and unlike me they''re paid to do it). As for me, my commitment here is too limited for me to have even scanned the entire thread list, let alone read discussions with names like "ZFS send needs optimalization" that seem unlikely to be relevant to my particular interests.> Lukasz happens to be my friend who is also working with the same > environment.That just might help explain why you happened to be aware of this obscure little tidbit of information, then.> > The point is, and you as a long time developer (I guess) should know it, > you can''t have everything done at once (lack of resources, and it takes > some time anyway) so you must prioritize.The issues here are not issues of prioritization but issues of denial. Your citation above is the first suggestion that I''ve seen (and by all appearances the first that anyone else participating in these discussions has seen) that the ZFS crew considers the fragmentation issue important enough to merit active attention in the future. Do you by any chance have any similar hint of recognition that RAID-Z might benefit from revamping as well? ZFS is open source and if> someone thinks that given feature is more important than the other > he/she should try to fix it or at least voice it here so ZFS > developers can possibly adjust their priorities if there''s good enough > and justified demand.That just won''t wash, Robert: as I noted above, the problem here has been denial that these are flaws at all, not just a debate about how to ''prioritize'' addressing them (though in the case of RAID-Z I recall seeing some indication that at least one person was interested in - or perhaps actually is - working on RAID-5-like support because they see problems with RAID-Z).> > Now the important part - quite a lot of people are using ZFS, from > desktop usage, their laptops, small to big production environments, > clustered environments, SAN environemnts, JBODs, entry-level to high-end arrays, > different applications, workloads, etc. And somehow you can''t find > many complaints about ZFS fragmentation.The entire basis for that database thread (initiated by someone else, you will note) was ZFS fragmentation, and a great deal of its content arose from the resistance of many here to the idea that it might constitute a problem *in that specific environment* (let alone more generally). Most environments actually aren''t all that performance-sensitive, so of course they don''t complain. Even if they run into problems, they just buy more hardware - because that''s what they''re used to doing: the idea that better software could eliminate the need to do so either doesn''t cross their minds at all or seems like too much of a pipe dream to take seriously. Trouble is, ZFS and its fanboys tout it as offering *superior* - not merely adequate - performance, whereas for some not-all-that-uncommon situations its performance can be worse *by over an order of magnitude* due to the fragmention which is designed into its operation and for which no current relief is available (nor was any relief apparently generally known to be projected for the future, until now). The fact that many installations may be able to laugh off an order-of-magnitude performance handicap is not the point: the point is that if the claims for ZFS had been more balanced in this area, I''d have far less to criticize - I''d just observe that there was significant room for improvement and leave it at that. ...> Then you find people like Pawel Jakub Davidek (guy who ported ZFS to > FreeBSD) who started experimenting with RAID-5 like implementation > with ZFS - he provided even some numbers showing it might be worth > looking at. That''s what community is about.Ah - that may be what I was recalling above. Strange, once again, that it never popped up in the current discussions until now.> > I don''t see any point complaining about ZFS all over again - have you > actually run into the problem with ZFS yourself? I guess not.I haven''t been sent to Guantanamo and held for years without trial, either - but that doesn''t mean that I have no business criticizing the practice, and in particular persisting if that criticism is met with denial that any problem exists (even though indeed it''s not *my* problem). You just> assuming (correctly for some usage cases). I guess your message has > been well heard.But hardly well understood. Since you''re not interested in anything more that> bashing or complaining all the time about the same theoretical "issues" rather > than contributing somehow (even by providing some test results which > could be repeated)I''ve told you what I''m doing, and why I''m doing it, and why it''s beyond stupid to complain about the lack of ''test results'' in situations as clear-cut as these are, and how to go about fixing them - and you still come back with crap like this. Is it any wonder that my respect for so many of you is close to zero? I wouldn''t wait for any positive feedback if I were> you - anyway, what kind of feedback are you waiting for?I''m waiting for the idiots either to shut up or to shape up. And I remain sufficiently (though now verging on perversely) curious about just how long that will take to keep working on it.> > > cyg> Last I knew, ZFS was still claiming that it needed nothing like > cyg> defragmentation, while describing write allocation mechanisms > cyg> that could allow disastrous degrees of fragmentation under > cyg> conditions that I''ve described quite clearly. > > Well, I haven''t talked to ZFS (yet) so I don''t know what he claims :))Perhaps you should do *your* ''work home'' more properly, then: there are several developers who have presumed to speak for ZFS over the years, and their statements are well documented (you could start with the presentations that they''ve made).> If you are talking about ZFS developers then you can actually find > some evidence that they do see that problem and want to work on it. > Again see for example: http://www.opensolaris.org/jive/thread.jspa?messageID=139680?? > Bill, at least look at the list archives first.I believe that I covered that adequately above, Robert. But given your demonstrated inability to absorb information even after several repetitions, I''ll suggest that you simply keep working on understanding it (and the rest of this response) until you actually *do* understand it, before attempting to reply to it.> > And again, "under conditions that I''ve described quite clearly." - > that''s exactly the problem. You''ve just described something while > others do have actual and real problems which should be addressed > first.Once again, you are confusing the very real problem of stone-wall denial here with a simple issue of prioritization.> > > cyg> If ZFS made no > cyg> efforts whatsoever in this respect the potential for unacceptable > cyg> performance would probably already have been obvious even to its > cyg> blindest supporters, > > Well, is it really so hard to understand that a lot of people use ZFS > because it actually solves their problems?Not at all: it''s just far from obvious that it solves their problems any (let alone significantly) better than other existing open source options. And that would not be any issue if some people here weren''t so zealous in asserting ZFS''s alleged stunning superiority - but if they continue to do so, I''ll continue to challenge them to *substantiate* that claim. No matter what case> scenarios you will find to theoretically show some ZFS weaker points, > at the end what matters is if it does solve customer problems. And for > many users it does, definitely not for all of them. > I would argue that no matter what file system you will test or even > design, one can always find a corner cases when it will behave less > than optimal. For a general purpose file system what matters is that > in most common cases it''s good enough.And if "It''s good enough" were all that people were claiming about ZFS there''d be very little to dispute (though no less room for improvement, of course - and probably a great deal less resistance to suggestions of how to go about it). ...> cyg> Then there''s RAID-Z, which smears individual blocks across > cyg> multiple disks in a manner that makes small-to-medium random > cyg> access throughput suck. Again, this is simple logic and physics: > cyg> if you understand the layout and the disk characteristics, you > cyg> can predict the effects on a heavily parallel workload with > cyg> fairly decent accuracy (I think that Roch mentioned this casually > cyg> at one point, so it''s hardly controversial, and I remember > cyg> reading a comment by Jeff Bonwick that he was pleased with the > cyg> result of one benchmark - which made no effort to demonstrate the > cyg> worst case - because the throughput penalty was ''only'' a factor > cyg> of 2 rather than the full factor of N). > > Yeah, nothing really new here. If you need a guy from Sun, then read > Roch''s post on RAID-Z performance. Nothing you''ve discovered here.Hmmm. I took a quick look through Roch''s posts here and didn''t find a title that suggested such a topic (though he does tend to get involved in discussions that are also of interest to me, so the time wasn''t completely wasted). If you''re referring to his mid-2006 blog post, had you read the discussion that followed it you would have found that I participated actively and in fact raised many of the same issues that I''ve raised again here (points that he either hadn''t covered or hadn''t realized had alternatives that did not suffer from comparable limitations, plus more general observations on the fragmentation problem). Incidentally (since comments to that post are now closed), his IOPS calculation at the end was flawed: the formula he presented yielded not the number of disks to use in each group but the number of groups to use.> Nevertheless RAID-Z[2] is good enough for many people. > I know that simple logic and physics states that relativity equations > provide better accuracy than Newton''s - nevertheless in most scenarios > I''m dealing with it doesn''t really matter from a practical point of > view.Given your expressed preference for ''real problems'' above, it''s worth noting that in my quick scan through Roch''s posts here I happened upon this (referring to performance issues using RAID-Z): "Now I have to find a way to justify myself with my head office that after spending 100k+ in hw and migrating to "the most advanced OS" we are running about 8 time slower :)" Some people might consider such a problem to be ''real'' (and somewhat personal as well); he goes on to observe that "while that rsync process is running, ZONEX is completely unusable because of the rsync I/O load" - another ''real-world'' indication of how excessive (and unnecessary) RAID-Z disk loading compromises other aspects of system performance (though limited scheduling intelligence may have contributed to this as well). Since I stumbled upon that without even looking for it or scanning more than a minute fraction of 1% of the posts here, there''s an excellent possibility that considerably more such are lurking elsewhere in this forum (want to do some ''work home'' and find out?).> > Then, in some environments RAID-Z2 (on JBOD) actually provides better > performance than RAID-5 (and HW R5 for that matter). And, opposite > to you, I''m not speculating but I''ve been working with such > environment (lot of concurrent writes which are more critical than > much less reads later).Don''t confuse apples with oranges. As long as it can accumulate enough dirty data before it has to flush it to disk, COW with batch write-back can make *any* write strategy work well. So there''s no need to accept the brain-damaged nature of RAID-Z''s performance with small-to-medium-sized random accesses in order to obtain the good performance that you describe above: a good ZFS RAID-5-like implementation could do just as well for those workloads *plus* beat both conventional RAID-5 and RAID-Z at small-update workloads *plus* cremate RAID-Z in terms of throughput on small-to-medium read workloads. The main limitation of the straight-forward way to implement this is that it would only be easily applicable to multi-block files, because each stripe could contain data from only one file (so as to avoid an additional level of access indirection); of course, in principle you could stripe a file as small as four disk sectors (2 KB) across 4 disks plus one for parity, so this approach would be inapplicable only to *tiny* files - around the size that one might start considering embedding in their disk inode, given a design that allowed that flexibility. While small files may get a large share of the access load in some environments, in most environments they consume only a small proportion of the storage space, so just leaving them to be mirrored would probably be an eminently viable strategy - and exploring more interesting alternatives wouldn''t be productive anyway until you''ve managed to understand the basic one.> So when you saying that RAID-Z is brain-damaging - well, it''s > mostly positive experience of a lot of people with RAID-Z vs. your statement without any > real-world backing.I just provided one example above from a participant in this forum (and it seems unlikely that it''s the only one). Does that mean that I get to accuse you of not having "done your work home properly", because you were unaware of it? ...> cyg> And the way ZFS aparently dropped the ball on its alleged > cyg> elimination of any kind of ''volume management'' by requiring that > cyg> users create explicit (and matched) aggregations of disks to > cyg> support mirroring and RAID-Z. > > # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4 > # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 /var/tmp/f4 > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > bill 373M 90K 373M 0% ONLINE - > # > # mkfile 128m f11 ; mkfile 256m f44 > # zpool destroy bill > # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz /var/tmp/f3 /var/tmp/f4 /var/tmp/f44 > # zfs list > NAME USED AVAIL REFER MOUNTPOINT > bill 101K 715M 32.6K /bill > # > (2*128+2*256=768) - looks fine. > > If you are talking about a solution which enables user to mix > different disk sizes in the same mirror or RAID-5 group and while all > the time providing given protection allows you to utilize 100% of all > disk capacities.... well, what is that solution? Is it free? > Open source? Available on general purpose OS? Or commodity HW? > Available at all? :PI''m talking about what ZFS *could* have provided to make good on their claim that they had eliminated (or at least effectively hidden) volume-management: a *real* ''storage pool'' that just accepted whatever disks you gave it and could be used transparently to provide whatever form of redundancy was desired on a per-file basis, with the ability to add or remove individual disks at will. No need to create separate pools for non-redundant data, mirrors, parity RAID, etc.: it would ''just work'', in the manner that some people would like to claim ZFS already does (and to some degree perhaps it actually does, but not when it comes to redundant storage). And yes, across a very wide range of disk-size variations it''s possible to utilize 100% of the capacity of each individual disk in such a pool using relatively simple distribution strategies - especially if you can perform very minor rearrangements to cover corner cases (though ZFS-style snapshots would hinder that, which is one of the reasons - defragmentation being another, and rebalancing across multiple nodes being a third - that I favor a different snapshot approach). I described this here well over a year ago, and Bill Moore said they had actually considered it but had shelved it for various reasons (none of which appeared insurmountable - but he may have been making different assumptions about how it could be implemented).> > > cyg> Now, if someone came up with any kind of credible rebuttal to > cyg> these assertions we could at least discuss it on technical > cyg> grounds. But (and again you should consider this significant) no > cyg> one has: all we have is well-reasoned analysis on the one hand > cyg> and some (often fairly obnoxious) fanboy babble on the other. If > cyg> you step back, make the effort required to *understand* that > cyg> analysis, and try to look at the situation objectively, which do you find more credible? > > Most credible to me is actual user experience than some theoretical > burbling.That''s usually the case with amateurs who have difficulty understanding in detail how the systems that they use work. But at least many of them have the sense not to argue interminably with people who have actually designed and built such systems and *do* understand them in (excruciating) detail. ...> cyg> ZFS has other deficiencies, but they''re more fundamental choices > cyg> involving poor trade-offs and lack of vision than outright (and > cyg> easily rectifiable) flaws, so they could more justifiably be > cyg> termed ''judgment calls'' and I haven''t delved as deeply into them. > > And what they are?Once again, you''ve failed to do your ''work home'' - since I''ve mentioned them here previously: 1. Implementation tied to a centralized server - scales only ''up'', not ''out''. 2. Snapshot mechanism that makes reorganization expensive (including reorganization across nodes - so it''s a scaling impediment as well as a performance trade-off). 3. Explicit pointer (indirect block) trees for large files rather than a flatter mechanism that avoids deep tree look-ups (with high-level data distribution handled algorithmically - which also helps avoid the need to update pointers in bulk when inter-node rebalancing operations occur and confines pointer updates on writes to the node that holds the data). 4. Trying to use block size to manage both access granularity and on-disk contiguity for performance (though background reorganization could help the latter and leave the former free to adjust just for access granularity - so that design choice could be considered one of the flaws already discussed above). There were probably more, but as you likely wouldn''t understand them any better than you''ve understood anything else there''s little point in dredging them up again. ...> cyg> But they''re the main reason I have no interest in ''working on'' > > Well, you''re not using ZFS, you are not interested in working on it, > all you are interested is finding some potential corner cases bad for > ZFS and bashing it. If you put at least 10% of your energy you''re > putting in your ''holy war'' you would at least provide some benchmarks > (filebench?) showing these corner cases in comparison to other > mind-blowing solutions on the market which are much better than ZFS, > so we can all reproduce them and try to address ZFS problems.I really don''t have much interest in meeting *your* criteria for being convinced, Robert - at least in part because it''s not clear that *anything* would convince you. So it''s more fun to see how completely committed people like you are to keeping their heads firmly wedged up where the sun don''t shine to avoid actually facing up to the fact that ZFS just ain''t quite what you thought it was. ...> cyg> You really haven''t bothered to read much at all, have you. I''ve > cyg> said, multiple times, that I came here initially in the hope of > cyg> learning something interesting. More recently, I came here > cyg> because I offered a more balanced assessment of ZFS''s strengths > cyg> and weaknesses in responding to the Yager article and wanted to > cyg> be sure that I had not treated ZFS unfairly in some way - which > cyg> started this extended interchange. After that, I explained that > cyg> while the likelihood of learning anything technical here was > cyg> looking pretty poor, I didn''t particularly like some of the > cyg> personal attacks that I''d been subject to and had decided to confront them. > > Well, every time I saw it was you ''attacking'' other people first.Then you obviously missed a great many posts, but given the readily-apparent quality of your other research I don''t find that surprising at all. ...> If you are not contributing here, and you are not learning here - wy > are you here? I''m serious - why?I explained that, in detail, in my previous post. Given the expressed ''seriousness'' of your repeat question here I was going to ask whether you are functionally illiterate, but your advice below brought up an another possibility. ...> cyg> No, my attitude is that people too stupid and/or too lazy to > cyg> understand what I *have* been delivering don''t deserve much respect if they complain. > > Maybe you should thing about that "stupid" part...As usual, I thought about it *before* I said it. However, I did inadvertently omit a third possibility - that people such as you (who don''t quite strike me as being abjectly stupid or drop-dead lazy) are instead simply too intellectually dishonest (whether intentionally or so habitually that it has become subconscious) to understand what I''ve been ''delivering''. So you''re right: there''s always room to refine one''s understanding, and another relevant quotation comes to mind ("There are none so blind as those who will not see").> Maybe, just maybe, it''s possible that all people around you don''t > understand you, that world is wrong and we''re all so stupid. Well, > maybe. Even if it is so, then perhaps it''s time to stop being Don Quixote > and move on?No, but it might be getting close to it - I''ll let you know. - bill This message posted from opensolaris.org
(apologies if this gets posted twice - it disappeared the first time, and it''s not clear whether that was intentional)> Hello can, > > Tuesday, December 11, 2007, 6:57:43 PM, you wrote: > >>> Monday, December 10, 2007, 3:35:27 AM, you wrote: >>> >>> cyg> and it >>>>> made them slower >>> cyg> That''s the second time you''ve claimed that, so you''ll really at >>> cyg> least have to describe *how* you measured this even if the >>> cyg> detailed results of those measurements may be lost in the mists of time. >>> >>> >>> cyg> So far you don''t really have much of a position to defend at >>> cyg> all: rather, you sound like a lot of the disgruntled TOPS users >>> cyg> of that era. Not that they didn''t have good reasons to feel >>> cyg> disgruntled - but they frequently weren''t very careful about aiming their ire accurately. >>> >>> cyg> Given that RMS really was *capable* of coming very close to the >>> cyg> performance capabilities of the underlying hardware, your >>> cyg> allegations just don''t ring true. Not being able to jump into >>> >>> And where is your "proof" that it "was capable of coming very close to >>> the..."? > > cyg> It''s simple: I *know* it, because I worked *with*, and *on*, it > cyg> - for many years. So when some bozo who worked with people with > cyg> a major known chip on their shoulder over two decades ago comes > cyg> along and knocks its capabilities, asking for specifics (not even > cyg> hard evidence, just specific allegations which could be evaluated > cyg> and if appropriate confronted) is hardly unreasonable. > > Bill, you openly criticize people (their work) who have worked on ZFS > for years... not that there''s anything wrong with that, just please > realize that because you were working on it it doesn''t mean it is/was > perfect - just the same as with ZFS.Of course it doesn''t - and I never claimed that RMS was anything close to ''perfect'' (I even gave specific examples of areas in which it was *far* from perfect). Just as I''ve given specific examples of where ZFS is far from perfect. What I challenged was David''s assertion that RMS was severely deficient in its *capabilities* - and demanded not ''proof'' of any kind but only specific examples (comparable in specificity to the examples of ZFS''s deficiencies that *I* have provided) that could actually be discussed.> I know, everyone loves their baby...No, you don''t know: you just assume that everyone is as biased as you and others here seem to be.> > Nevertheless just because you were working on and with it, it''s not a > proof. The person you were replaying to was also working with it (but > not on it I guess). Not that I''m interested in such a proof. Just > noticed that you''re demanding some proof, while you are also just > write some statements on its performance without any actual proof.You really ought to spend a lot more time understanding what you''ve read before responding to it, Robert. I *never* asked for anything like ''proof'': I asked for *examples* specific enough to address - and repeated that explicitly in responding to your previous demand for ''proof''. Perhaps I should at that time have observed that your demand for ''proof'' (your use of quotes suggesting that it was something that *I* had demanded) was ridiculous, but I thought my response made that obvious.> > > >>> Let me use your own words: >>> >>> "In other words, you''ve got nothing, but you''d like people to believe it''s something. >>> >>> The phrase "Put up or shut up" comes to mind." >>> >>> Where are your proofs on some of your claims about ZFS? > > cyg> Well, aside from the fact that anyone with even half a clue > cyg> knows what the effects of uncontrolled file fragmentation are on > cyg> sequential access performance (and can even estimate those > cyg> effects within moderately small error bounds if they know what > cyg> the disk characteristics are and how bad the fragmentation is), > cyg> if you''re looking for additional evidence that even someone > cyg> otherwise totally ignorant could appreciate there''s the fact that > > I''ve never said there are not fragmentation problems with ZFS.Not having made a study of your collected ZFS contributions here I didn''t know that. But some of ZFS''s developers are on record stating that they believe there is no need to defragment (unless they''ve changed their views since and not bothered to make us aware of it), and in the entire discussion in the recent ''ZFS + DB + "fragments"'' thread there were only three contributors (Roch, Anton, and I) who seemed willing to admit that any problem existed. So since one of my ''claims'' for which you requested substantiation involved fragmentation problems, it seemed appropriate to address them.> Well, actually I''ve been hit by the issue in one environment.But didn''t feel any impulse to mention that during all the preceding discussion, I guess.> Also you haven''t done your work home properly, as one of ZFS > developers actually stated they are going to work on ZFS > de-fragmentation and disk removal (pool shrinking). > See http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠Hmmm - there were at least two Sun ZFS personnel participating in the database thread, and they never mentioned this. I guess they didn''t do their ''work home'' properly either (and unlike me they''re paid to do it). As for me, my commitment here is too limited for me to have even scanned the entire thread list, let alone read discussions with names like "ZFS send needs optimalization" that seem unlikely to be relevant to my particular interests.> Lukasz happens to be my friend who is also working with the same > environment.That just might help explain why you happened to be aware of this obscure little tidbit of information, then.> > The point is, and you as a long time developer (I guess) should know it, > you can''t have everything done at once (lack of resources, and it takes > some time anyway) so you must prioritize.The issues here are not issues of prioritization but issues of denial. Your citation above is the first suggestion that I''ve seen (and by all appearances the first that anyone else participating in these discussions has seen) that the ZFS crew considers the fragmentation issue important enough to merit active attention in the future. Do you by any chance have any similar hint of recognition that RAID-Z might benefit from revamping as well? ZFS is open source and if> someone thinks that given feature is more important than the other > he/she should try to fix it or at least voice it here so ZFS > developers can possibly adjust their priorities if there''s good enough > and justified demand.That just won''t wash, Robert: as I noted above, the problem here has been denial that these are flaws at all, not just a debate about how to ''prioritize'' addressing them (though in the case of RAID-Z I recall seeing some indication that at least one person was interested in - or perhaps actually is - working on RAID-5-like support because they see problems with RAID-Z).> > Now the important part - quite a lot of people are using ZFS, from > desktop usage, their laptops, small to big production environments, > clustered environments, SAN environemnts, JBODs, entry-level to high-end arrays, > different applications, workloads, etc. And somehow you can''t find > many complaints about ZFS fragmentation.The entire basis for that database thread (initiated by someone else, you will note) was ZFS fragmentation, and a great deal of its content arose from the resistance of many here to the idea that it might constitute a problem *in that specific environment* (let alone more generally). Most environments actually aren''t all that performance-sensitive, so of course they don''t complain. Even if they run into problems, they just buy more hardware - because that''s what they''re used to doing: the idea that better software could eliminate the need to do so either doesn''t cross their minds at all or seems like too much of a pipe dream to take seriously. Trouble is, ZFS and its fanboys tout it as offering *superior* - not merely adequate - performance, whereas for some not-all-that-uncommon situations its performance can be worse *by over an order of magnitude* due to the fragmention which is designed into its operation and for which no current relief is available (nor was any relief apparently generally known to be projected for the future, until now). The fact that many installations may be able to laugh off an order-of-magnitude performance handicap is not the point: the point is that if the claims for ZFS had been more balanced in this area, I''d have far less to criticize - I''d just observe that there was significant room for improvement and leave it at that. ...> Then you find people like Pawel Jakub Davidek (guy who ported ZFS to > FreeBSD) who started experimenting with RAID-5 like implementation > with ZFS - he provided even some numbers showing it might be worth > looking at. That''s what community is about.Ah - that may be what I was recalling above. Strange, once again, that it never popped up in the current discussions until now.> > I don''t see any point complaining about ZFS all over again - have you > actually run into the problem with ZFS yourself? I guess not.I haven''t been sent to Guantanamo and held for years without trial, either - but that doesn''t mean that I have no business criticizing the practice, and in particular persisting if that criticism is met with denial that any problem exists (even though indeed it''s not *my* problem). You just> assuming (correctly for some usage cases). I guess your message has > been well heard.But hardly well understood. Since you''re not interested in anything more that> bashing or complaining all the time about the same theoretical "issues" rather > than contributing somehow (even by providing some test results which > could be repeated)I''ve told you what I''m doing, and why I''m doing it, and why it''s beyond stupid to complain about the lack of ''test results'' in situations as clear-cut as these are, and how to go about fixing them - and you still come back with crap like this. Is it any wonder that my respect for so many of you is close to zero? I wouldn''t wait for any positive feedback if I were> you - anyway, what kind of feedback are you waiting for?I''m waiting for the idiots either to shut up or to shape up. And I remain sufficiently (though now verging on perversely) curious about just how long that will take to keep working on it.> > > cyg> Last I knew, ZFS was still claiming that it needed nothing like > cyg> defragmentation, while describing write allocation mechanisms > cyg> that could allow disastrous degrees of fragmentation under > cyg> conditions that I''ve described quite clearly. > > Well, I haven''t talked to ZFS (yet) so I don''t know what he claims :))Perhaps you should do *your* ''work home'' more properly, then: there are several developers who have presumed to speak for ZFS over the years, and their statements are well documented (you could start with the presentations that they''ve made).> If you are talking about ZFS developers then you can actually find > some evidence that they do see that problem and want to work on it. > Again see for example: http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 > Bill, at least look at the list archives first.I believe that I covered that adequately above, Robert. But given your demonstrated inability to absorb information even after several repetitions, I''ll suggest that you simply keep working on understanding it (and the rest of this response) until you actually *do* understand it, before attempting to reply to it.> > And again, "under conditions that I''ve described quite clearly." - > that''s exactly the problem. You''ve just described something while > others do have actual and real problems which should be addressed > first.Once again, you are confusing the very real problem of stone-wall denial here with a simple issue of prioritization.> > > cyg> If ZFS made no > cyg> efforts whatsoever in this respect the potential for unacceptable > cyg> performance would probably already have been obvious even to its > cyg> blindest supporters, > > Well, is it really so hard to understand that a lot of people use ZFS > because it actually solves their problems?Not at all: it''s just far from obvious that it solves their problems any (let alone significantly) better than other existing open source options. And that would not be any issue if some people here weren''t so zealous in asserting ZFS''s alleged stunning superiority - but if they continue to do so, I''ll continue to challenge them to *substantiate* that claim. No matter what case> scenarios you will find to theoretically show some ZFS weaker points, > at the end what matters is if it does solve customer problems. And for > many users it does, definitely not for all of them. > I would argue that no matter what file system you will test or even > design, one can always find a corner cases when it will behave less > than optimal. For a general purpose file system what matters is that > in most common cases it''s good enough.And if "It''s good enough" were all that people were claiming about ZFS there''d be very little to dispute (though no less room for improvement, of course - and probably a great deal less resistance to suggestions of how to go about it). ...> cyg> Then there''s RAID-Z, which smears individual blocks across > cyg> multiple disks in a manner that makes small-to-medium random > cyg> access throughput suck. Again, this is simple logic and physics: > cyg> if you understand the layout and the disk characteristics, you > cyg> can predict the effects on a heavily parallel workload with > cyg> fairly decent accuracy (I think that Roch mentioned this casually > cyg> at one point, so it''s hardly controversial, and I remember > cyg> reading a comment by Jeff Bonwick that he was pleased with the > cyg> result of one benchmark - which made no effort to demonstrate the > cyg> worst case - because the throughput penalty was ''only'' a factor > cyg> of 2 rather than the full factor of N). > > Yeah, nothing really new here. If you need a guy from Sun, then read > Roch''s post on RAID-Z performance. Nothing you''ve discovered here.Hmmm. I took a quick look through Roch''s posts here and didn''t find a title that suggested such a topic (though he does tend to get involved in discussions that are also of interest to me, so the time wasn''t completely wasted). If you''re referring to his mid-2006 blog post, had you read the discussion that followed it you would have found that I participated actively and in fact raised many of the same issues that I''ve raised again here (points that he either hadn''t covered or hadn''t realized had alternatives that did not suffer from comparable limitations, plus more general observations on the fragmentation problem). Incidentally (since comments to that post are now closed), his IOPS calculation at the end was flawed: the formula he presented yielded not the number of disks to use in each group but the number of groups to use.> Nevertheless RAID-Z[2] is good enough for many people. > I know that simple logic and physics states that relativity equations > provide better accuracy than Newton''s - nevertheless in most scenarios > I''m dealing with it doesn''t really matter from a practical point of > view.Given your expressed preference for ''real problems'' above, it''s worth noting that in my quick scan through Roch''s posts here I happened upon this (referring to performance issues using RAID-Z): "Now I have to find a way to justify myself with my head office that after spending 100k+ in hw and migrating to "the most advanced OS" we are running about 8 time slower :)" Some people might consider such a problem to be ''real'' (and somewhat personal as well); he goes on to observe that "while that rsync process is running, ZONEX is completely unusable because of the rsync I/O load" - another ''real-world'' indication of how excessive (and unnecessary) RAID-Z disk loading compromises other aspects of system performance (though limited scheduling intelligence may have contributed to this as well). Since I stumbled upon that without even looking for it or scanning more than a minute fraction of 1% of the posts here, there''s an excellent possibility that considerably more such are lurking elsewhere in this forum (want to do some ''work home'' and find out?).> > Then, in some environments RAID-Z2 (on JBOD) actually provides better > performance than RAID-5 (and HW R5 for that matter). And, opposite > to you, I''m not speculating but I''ve been working with such > environment (lot of concurrent writes which are more critical than > much less reads later).Don''t confuse apples with oranges. As long as it can accumulate enough dirty data before it has to flush it to disk, COW with batch write-back can make *any* write strategy work well. So there''s no need to accept the brain-damaged nature of RAID-Z''s performance with small-to-medium-sized random accesses in order to obtain the good performance that you describe above: a good ZFS RAID-5-like implementation could do just as well for those workloads *plus* beat both conventional RAID-5 and RAID-Z at small-update workloads *plus* cremate RAID-Z in terms of throughput on small-to-medium read workloads. The main limitation of the straight-forward way to implement this is that it would only be easily applicable to multi-block files, because each stripe could contain data from only one file (so as to avoid an additional level of access indirection); of course, in principle you could stripe a file as small as four disk sectors (2 KB) across 4 disks plus one for parity, so this approach would be inapplicable only to *tiny* files - around the size that one might start considering embedding in their disk inode, given a design that allowed that flexibility. While small files may get a large share of the access load in some environments, in most environments they consume only a small proportion of the storage space, so just leaving them to be mirrored would probably be an eminently viable strategy - and exploring more interesting alternatives wouldn''t be productive anyway until you''ve managed to understand the basic one.> So when you saying that RAID-Z is brain-damaging - well, it''s > mostly positive experience of a lot of people with RAID-Z vs. your statement without any > real-world backing.I just provided one example above from a participant in this forum (and it seems unlikely that it''s the only one). Does that mean that I get to accuse you of not having "done your work home properly", because you were unaware of it? ...> cyg> And the way ZFS aparently dropped the ball on its alleged > cyg> elimination of any kind of ''volume management'' by requiring that > cyg> users create explicit (and matched) aggregations of disks to > cyg> support mirroring and RAID-Z. > > # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4 > # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 /var/tmp/f4 > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > bill 373M 90K 373M 0% ONLINE - > # > # mkfile 128m f11 ; mkfile 256m f44 > # zpool destroy bill > # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz /var/tmp/f3 /var/tmp/f4 /var/tmp/f44 > # zfs list > NAME USED AVAIL REFER MOUNTPOINT > bill 101K 715M 32.6K /bill > # > (2*128+2*256=768) - looks fine. > > If you are talking about a solution which enables user to mix > different disk sizes in the same mirror or RAID-5 group and while all > the time providing given protection allows you to utilize 100% of all > disk capacities.... well, what is that solution? Is it free? > Open source? Available on general purpose OS? Or commodity HW? > Available at all? :PI''m talking about what ZFS *could* have provided to make good on their claim that they had eliminated (or at least effectively hidden) volume-management: a *real* ''storage pool'' that just accepted whatever disks you gave it and could be used transparently to provide whatever form of redundancy was desired on a per-file basis, with the ability to add or remove individual disks at will. No need to create separate pools for non-redundant data, mirrors, parity RAID, etc.: it would ''just work'', in the manner that some people would like to claim ZFS already does (and to some degree perhaps it actually does, but not when it comes to redundant storage). And yes, across a very wide range of disk-size variations it''s possible to utilize 100% of the capacity of each individual disk in such a pool using relatively simple distribution strategies - especially if you can perform very minor rearrangements to cover corner cases (though ZFS-style snapshots would hinder that, which is one of the reasons - defragmentation being another, and rebalancing across multiple nodes being a third - that I favor a different snapshot approach). I described this here well over a year ago, and Bill Moore said they had actually considered it but had shelved it for various reasons (none of which appeared insurmountable - but he may have been making different assumptions about how it could be implemented).> > > cyg> Now, if someone came up with any kind of credible rebuttal to > cyg> these assertions we could at least discuss it on technical > cyg> grounds. But (and again you should consider this significant) no > cyg> one has: all we have is well-reasoned analysis on the one hand > cyg> and some (often fairly obnoxious) fanboy babble on the other. If > cyg> you step back, make the effort required to *understand* that > cyg> analysis, and try to look at the situation objectively, which do you find more credible? > > Most credible to me is actual user experience than some theoretical > burbling.That''s usually the case with amateurs who have difficulty understanding in detail how the systems that they use work. But at least many of them have the sense not to argue interminably with people who have actually designed and built such systems and *do* understand them in (excruciating) detail. ...> cyg> ZFS has other deficiencies, but they''re more fundamental choices > cyg> involving poor trade-offs and lack of vision than outright (and > cyg> easily rectifiable) flaws, so they could more justifiably be > cyg> termed ''judgment calls'' and I haven''t delved as deeply into them. > > And what they are?Once again, you''ve failed to do your ''work home'' - since I''ve mentioned them here previously: 1. Implementation tied to a centralized server - scales only ''up'', not ''out''. 2. Snapshot mechanism that makes reorganization expensive (including reorganization across nodes - so it''s a scaling impediment as well as a performance trade-off). 3. Explicit pointer (indirect block) trees for large files rather than a flatter mechanism that avoids deep tree look-ups (with high-level data distribution handled algorithmically - which also helps avoid the need to update pointers in bulk when inter-node rebalancing operations occur and confines pointer updates on writes to the node that holds the data). 4. Trying to use block size to manage both access granularity and on-disk contiguity for performance (though background reorganization could help the latter and leave the former free to adjust just for access granularity - so that design choice could be considered one of the flaws already discussed above). There were probably more, but as you likely wouldn''t understand them any better than you''ve understood anything else there''s little point in dredging them up again. ...> cyg> But they''re the main reason I have no interest in ''working on'' > > Well, you''re not using ZFS, you are not interested in working on it, > all you are interested is finding some potential corner cases bad for > ZFS and bashing it. If you put at least 10% of your energy you''re > putting in your ''holy war'' you would at least provide some benchmarks > (filebench?) showing these corner cases in comparison to other > mind-blowing solutions on the market which are much better than ZFS, > so we can all reproduce them and try to address ZFS problems.I really don''t have much interest in meeting *your* criteria for being convinced, Robert - at least in part because it''s not clear that *anything* would convince you. So it''s more fun to see how completely committed people like you are to keeping their heads firmly wedged up where the sun don''t shine to avoid actually facing up to the fact that ZFS just ain''t quite what you thought it was. ...> cyg> You really haven''t bothered to read much at all, have you. I''ve > cyg> said, multiple times, that I came here initially in the hope of > cyg> learning something interesting. More recently, I came here > cyg> because I offered a more balanced assessment of ZFS''s strengths > cyg> and weaknesses in responding to the Yager article and wanted to > cyg> be sure that I had not treated ZFS unfairly in some way - which > cyg> started this extended interchange. After that, I explained that > cyg> while the likelihood of learning anything technical here was > cyg> looking pretty poor, I didn''t particularly like some of the > cyg> personal attacks that I''d been subject to and had decided to confront them. > > Well, every time I saw it was you ''attacking'' other people first.Then you obviously missed a great many posts, but given the readily-apparent quality of your other research I don''t find that surprising at all. ...> If you are not contributing here, and you are not learning here - wy > are you here? I''m serious - why?I explained that, in detail, in my previous post. Given the expressed ''seriousness'' of your repeat question here I was going to ask whether you are functionally illiterate, but your advice below brought up an another possibility. ...> cyg> No, my attitude is that people too stupid and/or too lazy to > cyg> understand what I *have* been delivering don''t deserve much respect if they complain. > > Maybe you should thing about that "stupid" part...As usual, I thought about it *before* I said it. However, I did inadvertently omit a third possibility - that people such as you (who don''t quite strike me as being abjectly stupid or drop-dead lazy) are instead simply too intellectually dishonest (whether intentionally or so habitually that it has become subconscious) to understand what I''ve been ''delivering''. So you''re right: there''s always room to refine one''s understanding, and another relevant quotation comes to mind ("There are none so blind as those who will not see").> Maybe, just maybe, it''s possible that all people around you don''t > understand you, that world is wrong and we''re all so stupid. Well, > maybe. Even if it is so, then perhaps it''s time to stop being Don Quixote > and move on?No, but it might be getting close to it - I''ll let you know. - bill This message posted from opensolaris.org
Hello can, I haven''t been wasting so much time as in this thread... but from time to time it won''t hurt :) More below :) Wednesday, December 12, 2007, 4:46:42 PM, you wrote:>> Hello Bill,>> I know, everyone loves their baby...cyg> No, you don''t know: you just assume that everyone is as biased cyg> as you and others here seem to be. Which in turn is just your assumption :)>> I''ve never said there are not fragmentation problems with ZFS.cyg> Not having made a study of your collected ZFS contributions here cyg> I didn''t know that. But some of ZFS''s developers are on record cyg> stating that they believe there is no need to defragment (unless cyg> they''ve changed their views since and not bothered to make us cyg> aware of it), and in the entire discussion in the recent ''ZFS + cyg> DB + "fragments"'' thread there were only three contributors cyg> (Roch, Anton, and I) who seemed willing to admit that any problem existed. Which ZFS developer said that there''s no need to defragment in ZFS? cyg> So since one of my ''claims'' for which you requested cyg> substantiation involved fragmentation problems, it seemed appropriate to address them. I would say that right now there are other more important things to be done in ZFS than addressing fragmentation. While in one environment it looks like lowering fragmentation would help with some issues, in all the other environments I haven''t run into fragmentation problem.>> Also you haven''t done your work home properly, as one of ZFS >> developers actually stated they are going to work on ZFS >> de-fragmentation and disk removal (pool shrinking). >> See http://www.opensolaris.org/jive/thread.jspa?messageID=139680?cyg> Hmmm - there were at least two Sun ZFS personnel participating cyg> in the database thread, and they never mentioned this. I guess cyg> they didn''t do their ''work home'' properly either (and unlike me they''re paid to do it). Maybe they don''t know? Different project, different group? My understanding (I might be wrong) is that actually what they are working on is disk removal from pool (which looks like is much more requested by people than fixing fragmentation ''problem''). In order to accomplish it you need a mechanism to re-arrange data in a pool, which as a side effect could be also used as a de-fragment tool. That doesn''t mean the pool won''t fragment again in a future - if it''s a real problem in given environment.>> The point is, and you as a long time developer (I guess) should know it, >> you can''t have everything done at once (lack of resources, and it takes >> some time anyway) so you must prioritize.cyg> The issues here are not issues of prioritization but issues of cyg> denial. Your citation above is the first suggestion that I''ve cyg> seen (and by all appearances the first that anyone else cyg> participating in these discussions has seen) that the ZFS crew cyg> considers the fragmentation issue important enough to merit active attention in the future. Jeeez... now you need some kind of acknowledge from ZFS developers every time you think you found something? Are you paying their bills or what? While it''s fine to talk about theoretical/hypothetical problems, I''m not entirely sure here is a good place to do it. On the other hand you can very often find ZFS developers responding on this list (and not only) to actual user problems. Another problem, I guess, could be - they already spent a lot of their time in projects they have to deliver - do you really expect them to spent still more time on analyzing some loosely statements of yours? Come on, they also have their private lifes and other things to do. Ignoring their customers/users would be unwise, responding to everyone with every problem, especially not a real user experience problem - would be just unpractical. Then there is your attitude - you know, there''s a very good reasons why people at interviews are checking if you can actually work with the others people in a group. You''re a very good example why. Don''t expect people to take you seriously if you behave the way you do. As you put it before - you get what you deserve for. You probably got even more attention here that you deserved. I guess, that you are another good engineer, quite skillful, unfortunately unable to work in a team, and definitely not with customers. I would say some people here recognized it within you and did their best to treat you seriously and actually hear you - it''s just that everyone has his limits. Looking thru your posts here, you can find lots words, some technical input but not much actual value - at first it could be entertaining, even intriguing but quickly becomes irritating. Bill, you could be the best engineer in the world, if you can''t communicate with it you''ll be the only one person who would recognize it. Or perhaps some people here (not only here) are right and for whatever reasons you are just trolling. cyg> Do you by any chance have any similar hint of recognition that cyg> RAID-Z might benefit from revamping as well? Maybe I do :)))) Seriously - is RAID-Z the best RAID humanity can think of? Nope. I doubt one exist for all cases if even for one. Is RAID-Z an improvement in some real workloads comparing to RAID-5 - sure, it is. In other environments RAID-5 delivers better performance. As usual - it''s hard to satisfy everyone. Maybe (definitely) what would be useful in ZFS is to actually have RAID-5 like implementation, along with RAID-Z, so write performance will suffer but concurrent random read performance will be much better. Now, perhaps (I haven''t analyzed it), one could create RAID-5 like implementation which offers similar logical storage capacity, always on-disk consistency and best performance for concurrent small random reads, concurrent small random writes, sequential reads, mixed workloads, etc. Even if there''s one, maybe it''s just hard to implement and no-one has done so far - are you volunteering? Users would love you - that one you can be sure. cyg> ZFS is open source and if>> someone thinks that given feature is more important than the other >> he/she should try to fix it or at least voice it here so ZFS >> developers can possibly adjust their priorities if there''s good enough >> and justified demand.cyg> That just won''t wash, Robert: as I noted above, the problem cyg> here has been denial that these are flaws at all, not just a cyg> debate about how to ''prioritize'' addressing them (though in the There''s much of denial here - but mostly from you. You just can''t understand why so many people are exited about ZFS that you try to persuade all of them they are wrong. Well, mister Bill - you are. Just because you''ve been mostly ignored here, for a very good reasons, it doesn''t mean all the other users are - quite often it''s the opposite. You behave like some people who can''t understand up-to day why DTrace is so revolutionary, and somehow they claim it''s all been before, there''s hardly anything new, etc. Of course, like you, those are people who actually don''t use it and can''t accept the fact that DTrace has actually greatly impacted many environments. The same like ZFS does and will still more in a future I believe. You may not like it, you may disagree with it, at the end it''s up to users (sys admins, etc.). cyg> Most environments actually aren''t all that cyg> performance-sensitive, so of course they don''t complain. Even if cyg> they run into problems, they just buy more hardware - because cyg> that''s what they''re used to doing: the idea that better software cyg> could eliminate the need to do so either doesn''t cross their cyg> minds at all or seems like too much of a pipe dream to take seriously. Try to keep focused pleased. If the environment is not performance-sensitive it won''t run into a performance problem one would need to fix by throwing more HW or other way. cyg> Trouble is, ZFS and its fanboys tout it as offering *superior* - cyg> not merely adequate - performance, whereas for some cyg> not-all-that-uncommon situations its performance can be worse *by cyg> over an order of magnitude* due to the fragmention which is cyg> designed into its operation and for which no current relief is cyg> available (nor was any relief apparently generally known to be cyg> projected for the future, until now). The fact that many Mr. Troll - where did that order of magnitude came from? I guess out of your head, again. Will ZFS deliver less performance in some cases comparing to other products - sure. Will it deliver better performance in other cases - sure. But it''s not only about performance, and there''ll be market for specialized, niche products for many years to come. I can find a corner case for every common file system in a marked and show it''s performing badly - not a big deal. Unless, that secret BillFS is somewhere...>> You just >> assuming (correctly for some usage cases). I guess your message has >> been well heard.cyg> But hardly well understood. Let me repeat my self. Bill, you could be the best engineer in the world, if you can''t communicate with it you''ll be the only one person who would recognize it. So even you can''t communicate with other people or there''s no much practical value in what you''re saying. cyg> Is it any cyg> wonder that my respect for so many of you is close to zero? Judging from your posts so far - no surprise at all. It shouldn''t be a surprise for you either that many of us feel the same regarding you. So maybe just leave us alone? cyg> I wouldn''t wait for any positive feedback if I were>> you - anyway, what kind of feedback are you waiting for?cyg> I''m waiting for the idiots either to shut up or to shape up. Sorry for being blunt - I suggest to start with yourself. Bill, really - step back, go to the cinema, read some book, forget about ZFS for some time at least, and world will look much better for you. You''ll see.>> And again, "under conditions that I''ve described quite clearly." - >> that''s exactly the problem. You''ve just described something while >> others do have actual and real problems which should be addressed >> first.cyg> Once again, you are confusing the very real problem of cyg> stone-wall denial here with a simple issue of prioritization. As it looks like there is actually some work being done regarding fragmentation in ZFS, you''re simply mistaken. There''s no stone-wall denial here. And whether you like it or not - it''s mostly a matter of priorities.>> Well, is it really so hard to understand that a lot of people use ZFS >> because it actually solves their problems?cyg> Not at all: it''s just far from obvious that it solves their cyg> problems any (let alone significantly) better than other existing cyg> open source options. And that would not be any issue if some cyg> people here weren''t so zealous in asserting ZFS''s alleged cyg> stunning superiority - but if they continue to do so, I''ll cyg> continue to challenge them to *substantiate* that claim. Well - I''ve been using ZFS in a production for years now. It has helped me with data corruption on many occasions. It saved lot of money as it was at least good enough comparing to NetApp for example. Storage management with lot of TBs is much much better experience since then. RAID-Z2 is helpful too. Built-in compression has helped on many occasions too. No need for /etc/vfstab /etc/df/dfstab has made it still simpler to manage. Free snapshots and clones are very helpful feature. ZFS+Zones - that combination just rocks - just don''t want to go back to "old ways". Performance - well, in some environments much better performance - especially for writes. Quick resilvering, file system or RAID creation - really cool. Can''t stand waiting 24hours for new HW RAID-5 array to build... with ZFS it''s just a couple of seconds... In most part it''s about entire package ZFS delivers - lot of the stuff you can find here and there, with ZFS you''ve got it all on a one plate. And it''s free, open source, it''s working on commodity hardware and on high-end one as well. It''s just changing some economics. ZFS is brilliant as a whole package. cyg> "Now I have to find a way to justify myself with my head office cyg> that after spending 100k+ in hw and migrating to "the most cyg> advanced OS" we are running about 8 time slower :)" cyg> Some people might consider such a problem to be ''real'' (and cyg> somewhat personal as well); he goes on to observe that "while cyg> that rsync process is running, ZONEX is completely unusable cyg> because of the rsync I/O load" - another ''real-world'' indication cyg> of how excessive (and unnecessary) RAID-Z disk loading cyg> compromises other aspects of system performance (though limited cyg> scheduling intelligence may have contributed to this as well). Well, then you can often find people buying some mid-range array, configuring RAID-5 out of 12 or more disks, putting lot of writes and complaining - why on the hell I spent so much money and I get so bad performance. Unfortunately, you''ve still got to understand technology to make a proper use of it. RAID-Z has its good points and its bad points - small random reads case is its weak point.>> Then, in some environments RAID-Z2 (on JBOD) actually provides better >> performance than RAID-5 (and HW R5 for that matter). And, opposite >> to you, I''m not speculating but I''ve been working with such >> environment (lot of concurrent writes which are more critical than >> much less reads later).cyg> Don''t confuse apples with oranges. As long as it can accumulate cyg> enough dirty data before it has to flush it to disk, COW with cyg> batch write-back can make *any* write strategy work well. So cyg> there''s no need to accept the brain-damaged nature of RAID-Z''s cyg> performance with small-to-medium-sized random accesses in order cyg> to obtain the good performance that you describe above: a good cyg> ZFS RAID-5-like implementation could do just as well for those cyg> workloads *plus* beat both conventional RAID-5 and RAID-Z at cyg> small-update workloads *plus* cremate RAID-Z in terms of cyg> throughput on small-to-medium read workloads. But still - you''re comparing RAID-Z to some non-existant implementation. What I care about is - what can I get from a market to solve my problems and how much would it cost me. Like it or not, ZFS is quite often a best choice. But, yes Bill, I agree with you - that non-existant implementation of your is better than ZFS. And remember, RAID-Z is not ZFS - it''s just a small part of it.>> So when you saying that RAID-Z is brain-damaging - well, it''s >> mostly positive experience of a lot of people with RAID-Z vs. your statement without any >> real-world backing.cyg> I just provided one example above from a participant in this cyg> forum (and it seems unlikely that it''s the only one). Does that cyg> mean that I get to accuse you of not having "done your work home cyg> properly", because you were unaware of it? Again, wrong assumption of yours. I was aware of it. That''s why I wrote "mostly positive experience". As usual you''ll find corner cases and because of lack of knowledge people hurt themselfes - as old problem as humanity exists. I''m also haven''t used RAID-Z in all cases because I needed more performance in a specific environment - nothing really hard to understand. cyg> I''m talking about what ZFS *could* have provided to make good on ok, if you''re comparing ZFS to a non-existant implementation of your version of ZFS - then maybe you''re right, current ZFS would look pale. Go ahead - implement it! Or help make current ZFS better. cyg> their claim that they had eliminated (or at least effectively cyg> hidden) volume-management: a *real* ''storage pool'' that just Have you ever worked with storage? cyg> And yes, across a very wide range of disk-size variations it''s cyg> possible to utilize 100% of the capacity of each individual disk cyg> in such a pool using relatively simple distribution strategies - cyg> especially if you can perform very minor rearrangements to cover Go ahead and implement it! Or maybe you can point me to some implementation I can use? cyg> That''s usually the case with amateurs who have difficulty cyg> understanding in detail how the systems that they use work. But cyg> at least many of them have the sense not to argue interminably cyg> with people who have actually designed and built such systems and cyg> *do* understand them in (excruciating) detail. I''m sorry that the file system you''ve been working on hasn''t induced so much excitement in users - maybe you should try working harder? Since you''ve got no interest in making ZFS better and you''re suggesting you are an expert in file systems - well, just deliver all the stuff you''re talking about. Make it open source, available on main platforms and world will love you. You know, it''s not about the ideas, at least not only - it''s about delivering the actual product which often means some compromises instead of pursuing a holy grail. They did deliver ZFS, which is not perfect but still much much better and promising in many aspects than what we''ve got in a market. And no surprise NetApp is afraid - Solaris+ZFS )or maybe even FreeBSD +ZFS) is a potential NetApp killer, at least NetApp we know. And you haven''t actually delivered anything like it. Since you''re not interested in improving ZFS, it''s hard to call you an expert. What users care about at the end are not promises but actual product - where''s yours? Why no-one cares? cyg> I really don''t have much interest in meeting *your* criteria for cyg> being convinced, Robert - at least in part because it''s not clear cyg> that *anything* would convince you. So it''s more fun to see how cyg> completely committed people like you are to keeping their heads cyg> firmly wedged up where the sun don''t shine to avoid actually cyg> facing up to the fact that ZFS just ain''t quite what you thought it was. First, I don''t want you to convince to anything - I doubt it''s possible. You''ve got your hidden agenda or troll mentality and probably there''s not much I can do about it. You are just a guy who feels he needs convince everyone they are wrong, while not even using that technology, who keeps referring to some non existent technology. People are exited about ZFS because they can use it - if it''s perfect or not, doesn''t matter much. What matters is it offers them often much better experience than other *available* technologies, and for free and in a open source form - which changes economics for some environments.>> Maybe, just maybe, it''s possible that all people around you don''t >> understand you, that world is wrong and we''re all so stupid. Well, >> maybe. Even if it is so, then perhaps it''s time to stop being Don Quixote >> and move on?cyg> No, but it might be getting close to it - I''ll let you know. Don''t bother - I really don''t care. Bill - I don''t think there''s a point in continuing that discussion. At least I see no point. I know you tried your best - unfortunately you haven''t convinced me and many others (if anyone). Somehow ZFS is still solving some problems better and cheaper than other solutions and I''m quite happy using it. As many other people are too. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
...> Bill - I don''t think there''s a point in continuing > that discussion.I think you''ve finally found something upon which we can agree. I still haven''t figured out exactly where on the stupid/intellectually dishonest spectrum you fall (lazy is probably out: you have put some effort in to responding), but it is clear that you''re hopeless. On the other hand, there''s always the possibility that someone else learned something useful out of this. And my question about just how committed you were to your ignorance has been answered. It''s difficult to imagine how someone so incompetent in the specific area that he''s debating can be so self-assured - I suspect that just not listening has a lot to do with it - but also kind of interesting to see that in action. - bill This message posted from opensolaris.org
Look, it''s obvious this guy talks about himself as if he is the person he is addressing. Please stop taking this personally and feeding the troll. can you guess? wrote:>> Bill - I don''t think there''s a point in continuing >> that discussion. >> > > I think you''ve finally found something upon which we can agree. I still haven''t figured out exactly where on the stupid/intellectually dishonest spectrum you fall (lazy is probably out: you have put some effort in to responding), but it is clear that you''re hopeless. > > On the other hand, there''s always the possibility that someone else learned something useful out of this. And my question about just how committed you were to your ignorance has been answered. It''s difficult to imagine how someone so incompetent in the specific area that he''s debating can be so self-assured - I suspect that just not listening has a lot to do with it - but also kind of interesting to see that in action. > > - bill
Hello can, Thursday, December 13, 2007, 12:02:56 AM, you wrote: cyg> On the other hand, there''s always the possibility that someone cyg> else learned something useful out of this. And my question about To be honest - there''s basically nothing useful in the thread, perhaps except one thing - doesn''t make any sense to listen to you. You''re just unable to talk to people. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello can, > > Thursday, December 13, 2007, 12:02:56 AM, you wrote: > > cyg> On the other hand, there''s always the possibility that someone > cyg> else learned something useful out of this. And my question about > > To be honest - there''s basically nothing useful in the thread, > perhaps except one thing - doesn''t make any sense to listen to you. > > You''re just unable to talk to people. >Have to agree 100%. I did learn how to filter out things from CYG in my email program though. Never had the need to do so before. Overall, the effect of fragmentation will be more and more negligible as ssd drives become more prominent. I think the ZFS developers are concentrating on the more important issues. Where performance is needed, technology will overcome the effects of fragmentation.
People.. for the n-teenth time, there are only two ways to kill a troll. One involves a woodchipper and the possibility of an unwelcome visit from the FBI, and the other involves ignoring them. Internet Trolls: http://en.wikipedia.org/wiki/Internet_troll http://www.linuxextremist.com/?p=34 Another perspective: http://sc.tri-bit.com/images/7/7e/greaterinternetfu#kwadtheory.jpg The irony of this whole thing is that by feeding Bill''s tollish tendencies, he has effectively eliminated himself from any job or contract where someone googles his name and thus will give him an enormous amount of time to troll forums. Who in their right mind would consciously hire someone who calls people idiots randomly to avoid the topic at hand. Being unemployed will just piss him off more and his trolling will only get worse. Hence, you don''t feed trolls!! This message posted from opensolaris.org
> Hello can, > > Thursday, December 13, 2007, 12:02:56 AM, you wrote: > > cyg> On the other hand, there''s always the > possibility that someone > cyg> else learned something useful out of this. And > my question about > > To be honest - there''s basically nothing useful in > the thread, > perhaps except one thing - doesn''t make any sense to > listen to you.I''m afraid you don''t qualify to have an opinion on that, Robert - because you so obviously *haven''t* really listened. Until it became obvious that you never would, I was willing to continue to attempt to carry on a technical discussion with you, while ignoring the morons here who had nothing whatsoever in the way of technical comments to offer (but continued to babble on anyway). - bill This message posted from opensolaris.org
Would you two please SHUT THE F$%K UP. Dear God, my kids don''t go own like this. Please - let it die already. Thanks very much. /jim can you guess? wrote:>> Hello can, >> >> Thursday, December 13, 2007, 12:02:56 AM, you wrote: >> >> cyg> On the other hand, there''s always the >> possibility that someone >> cyg> else learned something useful out of this. And >> my question about >> >> To be honest - there''s basically nothing useful in >> the thread, >> perhaps except one thing - doesn''t make any sense to >> listen to you. >> > > I''m afraid you don''t qualify to have an opinion on that, Robert - because you so obviously *haven''t* really listened. Until it became obvious that you never would, I was willing to continue to attempt to carry on a technical discussion with you, while ignoring the morons here who had nothing whatsoever in the way of technical comments to offer (but continued to babble on anyway). > > - bill > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
> Would you two please SHUT THE F$%K UP.Just for future reference, if you''re attempting to squelch a public conversation it''s often more effective to use private email to do it rather than contribute to the continuance of that public conversation yourself. Have a nice day! - bill This message posted from opensolaris.org