Michelle Sullivan http://www.mhix.org/ Sent from my iPad> On 01 May 2019, at 12:37, Karl Denninger <karl at denninger.net> wrote: > > On 4/30/2019 20:59, Michelle Sullivan wrote >>> On 01 May 2019, at 11:33, Karl Denninger <karl at denninger.net> wrote: >>> >>>> On 4/30/2019 19:14, Michelle Sullivan wrote: >>>> >>>> Michelle Sullivan >>>> http://www.mhix.org/ >>>> Sent from my iPad >>>> >>> Nope. I'd much rather *know* the data is corrupt and be forced to >>> restore from backups than to have SILENT corruption occur and perhaps >>> screw me 10 years down the road when the odds are my backups have >>> long-since been recycled. >> Ahh yes the be all and end all of ZFS.. stops the silent corruption of data.. but don?t install it on anything unless it?s server grade with backups and ECC RAM, but it?s good on laptops because it protects you from silent corruption of your data when 10 years later the backups have long-since been recycled... umm is that not a circular argument? >> >> Don?t get me wrong here.. and I know you (and some others are) zfs in the DC with 10s of thousands in redundant servers and/or backups to keep your critical data corruption free = good thing. >> >> ZFS on everything is what some say (because it prevents silent corruption) but then you have default policies to install it everywhere .. including hardware not equipped to function safely with it (in your own arguments) and yet it?s still good because it will still prevent silent corruption even though it relies on hardware that you can trust... umm say what? >> >> Anyhow veered way way off (the original) topic... >> >> Modest (part consumer grade, part commercial) suffered irreversible data loss because of a (very unusual, but not impossible) double power outage.. and no tools to recover the data (or part data) unless you have some form of backup because the file system deems the corruption to be too dangerous to let you access any of it (even the known good bits) ... >> >> Michelle > > IMHO you're dead wrong Michelle. I respect your opinion but disagree > vehemently.I guess we?ll have to agree to disagree then, but I think your attitude to pronounce me ?dead wrong? is short sighted, because it strikes of ?I?m right because ZFS is the answer to all problems.? .. I?ve been around in the industry long enough to see a variety of issues... some disasters, some not so... I also should know better than to run without backups but financial constraints precluded me.... as will for many non commercial people.> > I run ZFS on both of my laptops under FreeBSD. Both have > non-power-protected SSDs in them. Neither is mirrored or Raidz-anything. > > So why run ZFS instead of UFS? > > Because a scrub will detect data corruption that UFS cannot detect *at all.*I get it, I really do, but that balances out against, if you can?t rebuild it make sure you have (tested and working) backups and be prepared for downtime when such corruption does occur.> > It is a balance-of-harms test and you choose. I can make a very clean > argument that *greater information always wins*; that is, I prefer in > every case to *know* I'm screwed rather than not. I can defend against > being screwed with some amount of diligence but in order for that > diligence to be reasonable I have to know about the screwing in a > reasonable amount of time after it happens.Not disagreeing (and have not been.)> > You may have never had silent corruption bite you.I have... but not with data on disks.. most of my silent corruption issues have been with a layer or two above the hardware... like subversion commits overwriting previous commits without notification (damn I wish I could reliably replicate it!)> I have had it happen > several times over my IT career. If that happens to you the odds are > that it's absolutely unrecoverable and whatever gets corrupted is > *gone.*Every drive corruption I have suffered in my career I have been able to recover, all or partial data except where the hardware itself was totally hosed (Ie clean room options only available)... even with brtfs.. yuk.. puck.. yuk.. oh what a mess that was... still get nightmares on that one... but I still managed to get most of the data off... in fact I put it onto this machine I currently have problems with.. so after the nightmare of brtfs looks like zfs eventually nailed me.> The defensive measures against silent corruption require > retention of backup data *literally forever* for the entire useful life > of the information because from the point of corruption forward *the > backups are typically going to be complete and correct copies of the > corrupt data and thus equally worthless to what's on the disk itself.* > With non-ZFS filesystems quite a lot of thought and care has to go into > defending against that, and said defense usually requires the active > cooperation of whatever software wrote said file in the first placeSay what?> (e.g. a database, etc.)So dbs (any?) talk actively to the file systems (any?) to actively prevent silent corruption? Lol... I?m guessing you are actually talking about internal checks and balances of data in the DB to ensure that dat retrieved from disk is not corrupt/altered... you know like writing sha256 checksums of files you might download from the internet to ensure you got what you asked for and it wasn?t changed/altered in transit.> If said software has no tools to "walk" said > data or if it's impractical to have it do so you're at severe risk of > being hosed.Umm what? I?m talking about a userland (libzfs) tool (Ie doesn?t need the pool imported) such as zfs send (which requires the pool to be imported - hence me not calling it a userland tool) to allow a sending of data that can be found to other places where it can be either blindly recovered (corruption might be present) or can be used to locate files/paths etc that are known to be good (checksums match etc).. walk the structures, feed the data elsewhere where it can be examined/recovered... don?t alter it.... it?s a last resort tool when you don?t have working backups..> Prior to ZFS there really wasn't any comprehensive defense > against this sort of event. There are a whole host of applications that > manipulate data that are absolutely reliant on that sort of thing not > happening (e.g. anything using a btree data structure) and recovery if > it *does* happen is a five-alarm nightmare if it's possible at all. In > the worst-case scenario you don't detect the corruption and the data > that has the pointer to it that gets corrupted is overwritten and > destroyed. > > A ZFS scrub on a volume that has no redundancy cannot *fix* that > corruption but it can and will detect it.So you?re advocating restore from backup for every corruption ... ok...> This puts a boundary on the > backups that I must keep in order to *not* have that happen. This is of > very high value to me and is why, even on systems without ECC memory and > without redundant disks, provided there is enough RAM to make it > reasonable (e.g. not on embedded systems I do development on with are > severely RAM-constrained) I run ZFS. > > BTW if you've never had a UFS volume unlink all the blocks within a file > on an fsck and then recover them back into the free list after a crash > you're a rare bird indeed. If you think a corrupt ZFS volume is fun try > to get your data back from said file after that happens.Been there done that though with ext2 rather than UFS.. still got all my data back... even though it was a nightmare..> > -- > Karl Denninger > karl at denninger.net <mailto:karl at denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/
On Tue, Apr 30, 2019 at 8:19 PM Michelle Sullivan <michelle at sorbs.net> wrote:> > > Michelle Sullivan > http://www.mhix.org/ > Sent from my iPad > > > On 01 May 2019, at 12:37, Karl Denninger <karl at denninger.net> wrote: > > > > On 4/30/2019 20:59, Michelle Sullivan wrote > >>> On 01 May 2019, at 11:33, Karl Denninger <karl at denninger.net> wrote: > >>> > >>>> On 4/30/2019 19:14, Michelle Sullivan wrote: > >>>> > >>>> Michelle Sullivan > >>>> http://www.mhix.org/ > >>>> Sent from my iPad > >>>> > >>> Nope. I'd much rather *know* the data is corrupt and be forced to > >>> restore from backups than to have SILENT corruption occur and perhaps > >>> screw me 10 years down the road when the odds are my backups have > >>> long-since been recycled. > >> Ahh yes the be all and end all of ZFS.. stops the silent corruption of > data.. but don?t install it on anything unless it?s server grade with > backups and ECC RAM, but it?s good on laptops because it protects you from > silent corruption of your data when 10 years later the backups have > long-since been recycled... umm is that not a circular argument? > >> >ZFS works fine on systems with ECC. According to one of the original Sun authors of ZFS, the scrub of death is a myth. A non-ECC running ZFS is not more risky than one running UFS. As far as backups go, you should have backups for your important data. This is true for any filesystem. Home users have been using backups for decades. For anybody that has important data, not having a backup is a false economy. Odds are that most people you don't have backup can afford them if they plan and allocate money for this task (as 8TB USB drives are now at Costco for ~$140, getting one of.should not be much of an issue). Tarsnap is great and cheap online backup system. Silent data corruption is a thing. CERN tested it 10-15 years on brand new, high end production hardware. They run a test running on 3000 new servers run for 1 week. They found 147 silent data corruption errors on the server farm (found due to ZFS error checking).> >> Don?t get me wrong here.. and I know you (and some others are) zfs in > the DC with 10s of thousands in redundant servers and/or backups to keep > your critical data corruption free = good thing. > >> > >> ZFS on everything is what some say (because it prevents silent > corruption) but then you have default policies to install it everywhere .. > including hardware not equipped to function safely with it (in your own > arguments) and yet it?s still good because it will still prevent silent > corruption even though it relies on hardware that you can trust... umm say > what? > >>I run ZFS on embedded firewalls, it has worked fine for years. ZFS by default is a good idea on systems that ZFS built in (such as FreeBSD and SmartOS). There are great things you do with boot environments and volume management is much nicer. Don't let the blowhards that say ZFS should only be run on server grade (ie ECC) hardware spook you. It works as well as UFS would on any system with at least 1GB of RAM (but I'd suggest getting at least 2GB if you can't get 4GB), you just need to adjust a few memory parameters.> >> Anyhow veered way way off (the original) topic... > >> > >> Modest (part consumer grade, part commercial) suffered irreversible > data loss because of a (very unusual, but not impossible) double power > outage.. and no tools to recover the data (or part data) unless you have > some form of backup because the file system deems the corruption to be too > dangerous to let you access any of it (even the known good bits) ... > >> > >> Michelle > > > > IMHO you're dead wrong Michelle. I respect your opinion but disagree > > vehemently. > > I guess we?ll have to agree to disagree then, but I think your attitude to > pronounce me ?dead wrong? is short sighted, because it strikes of ?I?m > right because ZFS is the answer to all problems.? .. I?ve been around in > the industry long enough to see a variety of issues... some disasters, some > not so... > > I also should know better than to run without backups but financial > constraints precluded me.... as will for many non commercial people. > > > > > I run ZFS on both of my laptops under FreeBSD. Both have > > non-power-protected SSDs in them. Neither is mirrored or Raidz-anything. > > > > So why run ZFS instead of UFS? > > > > Because a scrub will detect data corruption that UFS cannot detect *at > all.* > > I get it, I really do, but that balances out against, if you can?t rebuild > it make sure you have (tested and working) backups and be prepared for > downtime when such corruption does occur. > > > > > It is a balance-of-harms test and you choose. I can make a very clean > > argument that *greater information always wins*; that is, I prefer in > > every case to *know* I'm screwed rather than not. I can defend against > > being screwed with some amount of diligence but in order for that > > diligence to be reasonable I have to know about the screwing in a > > reasonable amount of time after it happens. > > Not disagreeing (and have not been.) > > > > > You may have never had silent corruption bite you. > > I have... but not with data on disks.. most of my silent corruption > issues have been with a layer or two above the hardware... like subversion > commits overwriting previous commits without notification (damn I wish I > could reliably replicate it!) > > > > I have had it happen > > several times over my IT career. If that happens to you the odds are > > that it's absolutely unrecoverable and whatever gets corrupted is > > *gone.* > > Every drive corruption I have suffered in my career I have been able to > recover, all or partial data except where the hardware itself was totally > hosed (Ie clean room options only available)... even with brtfs.. yuk.. > puck.. yuk.. oh what a mess that was... still get nightmares on that > one... but I still managed to get most of the data off... in fact I put it > onto this machine I currently have problems with.. so after the nightmare > of brtfs looks like zfs eventually nailed me. > > > > The defensive measures against silent corruption require > > retention of backup data *literally forever* for the entire useful life > > of the information because from the point of corruption forward *the > > backups are typically going to be complete and correct copies of the > > corrupt data and thus equally worthless to what's on the disk itself.* > > With non-ZFS filesystems quite a lot of thought and care has to go into > > defending against that, and said defense usually requires the active > > cooperation of whatever software wrote said file in the first place > > Say what? > > > (e.g. a database, etc.) > > So dbs (any?) talk actively to the file systems (any?) to actively prevent > silent corruption? > > Lol... > > I?m guessing you are actually talking about internal checks and balances > of data in the DB to ensure that dat retrieved from disk is not > corrupt/altered... you know like writing sha256 checksums of files you > might download from the internet to ensure you got what you asked for and > it wasn?t changed/altered in transit. > > > If said software has no tools to "walk" said > > data or if it's impractical to have it do so you're at severe risk of > > being hosed. > > Umm what? I?m talking about a userland (libzfs) tool (Ie doesn?t need the > pool imported) such as zfs send (which requires the pool to be imported - > hence me not calling it a userland tool) to allow a sending of data that > can be found to other places where it can be either blindly recovered > (corruption might be present) or can be used to locate files/paths etc that > are known to be good (checksums match etc).. walk the structures, feed the > data elsewhere where it can be examined/recovered... don?t alter it.... > it?s a last resort tool when you don?t have working backups.. > > > Prior to ZFS there really wasn't any comprehensive defense > > against this sort of event. There are a whole host of applications that > > manipulate data that are absolutely reliant on that sort of thing not > > happening (e.g. anything using a btree data structure) and recovery if > > it *does* happen is a five-alarm nightmare if it's possible at all. In > > the worst-case scenario you don't detect the corruption and the data > > that has the pointer to it that gets corrupted is overwritten and > > destroyed. > > > > A ZFS scrub on a volume that has no redundancy cannot *fix* that > > corruption but it can and will detect it. > > So you?re advocating restore from backup for every corruption ... ok... > > > > This puts a boundary on the > > backups that I must keep in order to *not* have that happen. This is of > > very high value to me and is why, even on systems without ECC memory and > > without redundant disks, provided there is enough RAM to make it > > reasonable (e.g. not on embedded systems I do development on with are > > severely RAM-constrained) I run ZFS. > > > > BTW if you've never had a UFS volume unlink all the blocks within a file > > on an fsck and then recover them back into the free list after a crash > > you're a rare bird indeed. If you think a corrupt ZFS volume is fun try > > to get your data back from said file after that happens. > > Been there done that though with ext2 rather than UFS.. still got all my > data back... even though it was a nightmare.. > > > > > > -- > > Karl Denninger > > karl at denninger.net <mailto:karl at denninger.net> > > /The Market Ticker/ > > /[S/MIME encrypted email preferred]/ > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-- The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding. -- Justice Louis D. Brandeis
On Apr 30, 2019, at 11:17 PM, Michelle Sullivan <michelle at sorbs.net> wrote:>> I have had it happen >> several times over my IT career. If that happens to you the odds are >> that it's absolutely unrecoverable and whatever gets corrupted is >> *gone.* > > Every drive corruption I have suffered in my career I have been able to > recover, all or partial data except where the hardware itself was totally > hosed (Ie clean room options only available)... even with brtfs.. yuk.. > puck.. yuk.. oh what a mess that was... still get nightmares on that > one... but I still managed to get most of the data off... in fact I put > it onto this machine I currently have problems with.. so after the > nightmare of brtfs looks like zfs eventually nailed me.It sounds from reading this thread that FreeBSD's built-in tools for ZFS recovery were insufficient for the corruption your pool suffered. Have you looked at the digital forensics realm to see whether those tools might help you? This article claims to extend The Sleuth Kit to support pooled storage such as ZFS, and they even describe recovering the bulk of an image file from a pool that has a disk missing (Evaluation Section, "Scenario C: reconstructing an incomplete pool"): "Extending The Sleuth Kit and its underlying model for pooled storage file system forensic analysis" https://www.sciencedirect.com/science/article/pii/S1742287617301901>> If said software has no tools to "walk" said >> data or if it's impractical to have it do so you're at severe risk of >> being hosed. > > Umm what? I?m talking about a userland (libzfs) tool (Ie doesn?t need > the pool imported) such as zfs send (which requires the pool to be > imported - hence me not calling it a userland tool) to allow a sending of > data that can be found to other places where it can be either blindly > recovered (corruption might be present) or can be used to locate > files/paths etc that are known to be good (checksums match etc).. walk > the structures, feed the data elsewhere where it can be > examined/recovered... don?t alter it.... it?s a last resort tool when you > don?t have working backups..See above.>> BTW if you've never had a UFS volume unlink all the blocks within a file >> on an fsck and then recover them back into the free list after a crash >> you're a rare bird indeed. If you think a corrupt ZFS volume is fun try >> to get your data back from said file after that happens. > > Been there done that though with ext2 rather than UFS.. still got all my > data back... even though it was a nightmare..Is that an implication that had all your data been on UFS (or ext2:) this time around you would have got it all back? (I've got that impression through this thread from things you've written.) That sort of makes it sound like UFS is bulletproof to me. There are levels of corruption. Maybe what you suffered would have taken down UFS, too? I guess there's no way to know unless there's some way you can recreate exactly the circumstances that took down your original system (but this time your data on UFS). ;-) Cheers, Paul.