Comments inline..
Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
> On 30 Apr 2019, at 03:06, Alan Somers <asomers at freebsd.org> wrote:
>
>> On Mon, Apr 29, 2019 at 10:23 AM Michelle Sullivan <michelle at
sorbs.net> wrote:
>>
>> I know I'm not going to be popular for this, but I'll just drop
it here
>> anyhow.
>>
>> http://www.michellesullivan.org/blog/1726
>>
>> Perhaps one should reconsider either:
>>
>> 1. Looking at tools that may be able to recover corrupt ZFS metadata,
or
>> 2. Defaulting to non ZFS filesystems on install.
>>
>> --
>> Michelle Sullivan
>> http://www.mhix.org/
>
> Wow, losing multiple TB sucks for anybody. I'm sorry for your loss.
> But I want to respond to a few points from the blog post.
>
> 1) When ZFS says that "the data is always correct and there's no
need
> for fsck", they mean metadata as well as data. The spacemap is
> protected in exactly the same way as all other data and metadata. (to
> be pedantically correct, the labels and uberblocks are protected in a
> different way, but still protected). The only way to get metadata
> corruption is due a disk failure (3-disk failure when using RAIDZ2),
> or due to a software bug. Sadly, those do happen, and they're
> devilishly tricky to track down. The difference between ZFS and older
> filesystems is that older filesystems experience corruption during
> power loss _by_design_, not merely due to software bugs. A perfectly
> functioning UFS implementation will experience corruption during power
> loss, and that's why it needs to be fscked. It's not just
> theoretical, either. I use UFS on my development VMs, and they
> frequently experience corruption after a panic (which happens all the
> time because I'm working on kernel code).
I know, which is why I have ZVOLs with UFS filesystems in them for the
development VMs... in a perfect world the power would have been all good, the
upses would not be damaged and the generator would not run out of fuel because
of extended outage... in fact if it was a perfect world I wouldn?t have my own
mini dc at home.
>
> 2) Backups are essential with any filesystem, not just ZFS. After
> all, no amount of RAID will protect you from an accidental "rm -rf
/".
You only do it once... I did it back in 1995... haven?t ever done it again.
>
> 3) ZFS hotspares can be swapped in automatically, though they don't be
> default. It sounds like you already figured out how to assign a spare
> to the pool. To use it automatically, you must set the
"autoreplace"
> pool property and enable zfsd. The latter can be done with "sysrc
> zfsd_enable="YES"".
The system was originally built on 9.0, and got upgraded through out the
years... zfsd was not available back then. So get your point, but maybe you
didn?t realize this blog was a history of 8+ years?
>
> 4) It sounds like you're having a lot of power trouble. Have you
> tried sysutils/apcupsd from ports?
I did... Malta was notorious for it. Hence 6kva upses in the bottom of each
rack (4 racks), cross connected with the rack next to it and a backup
generator... Australia on the otherhand is a lot more stable (at least where I
am)... 2 power issues in 2 years... both within 10 hours... one was a
transformer, the other when some idiot took out a power pole (and I mean
actually took it out, it was literally snapped in half... how they got out of
the car and did a runner before the police or Ambos got there I?ll never know.)
> It's fairly handy. It can talk to
> a wide range of UPSes, and can be configured to do stuff like send you
> an email on power loss, and power down the server if the battery gets
> too low.
>
They could help this... all 4 upses are toast now. One caught fire, one no
longer detects AC input, the other two I?m not even trying after the first
catching fire... the lot are being replaced on insurance.
It?s a catalog of errors that most wouldn?t normally experience. However it
does show (to me) that ZFS on everything is a really bad idea... particularly
for home users where there is unknown hardware and you know they will mistreat
it... they certainly won?t have ECC RAM in laptops etc... unknown caching
facilities etc.. it?s a recipe for losing the root drive...
Regards,
Michelle