I know I'm not going to be popular for this, but I'll just drop it here anyhow. http://www.michellesullivan.org/blog/1726 Perhaps one should reconsider either: 1. Looking at tools that may be able to recover corrupt ZFS metadata, or 2. Defaulting to non ZFS filesystems on install. -- Michelle Sullivan http://www.mhix.org/
On Mon, Apr 29, 2019 at 10:23 AM Michelle Sullivan <michelle at sorbs.net> wrote:> > I know I'm not going to be popular for this, but I'll just drop it here > anyhow. > > http://www.michellesullivan.org/blog/1726 > > Perhaps one should reconsider either: > > 1. Looking at tools that may be able to recover corrupt ZFS metadata, or > 2. Defaulting to non ZFS filesystems on install. > > -- > Michelle Sullivan > http://www.mhix.org/Wow, losing multiple TB sucks for anybody. I'm sorry for your loss. But I want to respond to a few points from the blog post. 1) When ZFS says that "the data is always correct and there's no need for fsck", they mean metadata as well as data. The spacemap is protected in exactly the same way as all other data and metadata. (to be pedantically correct, the labels and uberblocks are protected in a different way, but still protected). The only way to get metadata corruption is due a disk failure (3-disk failure when using RAIDZ2), or due to a software bug. Sadly, those do happen, and they're devilishly tricky to track down. The difference between ZFS and older filesystems is that older filesystems experience corruption during power loss _by_design_, not merely due to software bugs. A perfectly functioning UFS implementation will experience corruption during power loss, and that's why it needs to be fscked. It's not just theoretical, either. I use UFS on my development VMs, and they frequently experience corruption after a panic (which happens all the time because I'm working on kernel code). 2) Backups are essential with any filesystem, not just ZFS. After all, no amount of RAID will protect you from an accidental "rm -rf /". 3) ZFS hotspares can be swapped in automatically, though they don't be default. It sounds like you already figured out how to assign a spare to the pool. To use it automatically, you must set the "autoreplace" pool property and enable zfsd. The latter can be done with "sysrc zfsd_enable="YES"". 4) It sounds like you're having a lot of power trouble. Have you tried sysutils/apcupsd from ports? It's fairly handy. It can talk to a wide range of UPSes, and can be configured to do stuff like send you an email on power loss, and power down the server if the battery gets too low. Better luck next time, -Alan
Hi!> I know I'm not going to be popular for this, but I'll just drop it here > anyhow. > > http://www.michellesullivan.org/blog/1726With all due respect, I think if that filesystem/server you describe has not kept with all those mishaps, I think it's not perfect, but nothing is.> Perhaps one should reconsider either: > > 2. Defaulting to non ZFS filesystems on install.I had more cases of UFS being toast than ZFS until now.> 1. Looking at tools that may be able to recover corrupt ZFS metadata, orHere I agree! Making tools available to dig around zombie zpools, which is icky in itself, would be helpful! -- pi at opsec.eu +49 171 3101372 One year to go !
Your story is so unusual I am wondering if its not fiction, I mean all sorts of power cuts where it just so happens the UPS fails every time, then you decide to ship a server halfway round the world, and on top of that you get a way above average rate of hard drive failures. But aside from all this you managed to recover multiple times. ZFS is never claimed to be a get out of jail free card, but it did survive in your case multiple times, I suggest tho if you value redundancy, do not use RAIDZ but use Mirror instead. I dont know why people keep persisting with raid 5/6 now days with drives as large as they are. I have used ZFS since the days of FreeBSD 8.x and its resilience compared to the likes of ext is astounding and especially compared to UFS. Before marking it down think how would UFS or ext have managed the scenarios you presented in your blog. Also think about where you hosting your data with all your power failures and the UPS equipment you utilise as well. On Mon, 29 Apr 2019 at 16:26, Michelle Sullivan <michelle at sorbs.net> wrote:> > I know I'm not going to be popular for this, but I'll just drop it here > anyhow. > > http://www.michellesullivan.org/blog/1726 > > Perhaps one should reconsider either: > > 1. Looking at tools that may be able to recover corrupt ZFS metadata, or > 2. Defaulting to non ZFS filesystems on install. > > -- > Michelle Sullivan > http://www.mhix.org/ > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"