thr3ads.net - freebsd stable - ZFS... [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Michelle Sullivan

2019-Apr-30 10:14 UTC

ZFS...

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
> On 30 Apr 2019, at 19:50, Xin LI <delphij at gmail.com> wrote:
> 
> 
>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle at
sorbs.net> wrote:
>> but in my recent experience 2 issues colliding at the same time results
in disaster
> 
> Do we know exactly what kind of corruption happen to your pool?  If you see
it twice in a row, it might suggest a software bug that should be investigated.
All I know is it?s a checksum error on a meta slab (122) and from what I can
gather it?s the spacemap that is corrupt... but I am no expert.  I don?t believe
it?s a software fault as such, because this was cause by a hard outage (damaged
UPSes) whilst resilvering a single (but completely failed) drive.  ...and after
the first outage a second occurred (same as the first but more damaging to the
power hardware)... the host itself was not damaged nor were the drives or
controller.
> 
> Note that ZFS stores multiple copies of its essential metadata, and in my
experience with my old, consumer grade crappy hardware (non-ECC RAM, with
several faulty, single hard drive pool: bad enough to crash almost monthly and
damages my data from time to time),
This was a top end consumer grade mb with non ecc ram that had been running for
8+ years without fault (except for hard drive platter failures.). Uptime would
have been years if it wasn?t for patching.
> I've never seen a corruption this bad and I was always able to recover
the pool.
So far, same.
> At previous employer, the only case that we had the pool corrupted enough
to the point that mount was not allowed was because two host nodes happen to
import the pool at the same time, which is a situation that can be avoided with
SCSI reservation; their hardware was of much better quality, though.
> 
> Speaking for a tool like 'fsck': I think I'm mostly convinced
that it's not necessary, because at the point ZFS says the metadata is
corrupted, it means that these metadata was really corrupted beyond repair (all
replicas were corrupted; otherwise it would recover by finding out the right
block and rewrite the bad ones).
I see this message all the time and mostly agree.. actually I do agree with
possibly a minor exception, but so minor it?s probably not worth it.  However as
I suggested in my original post.. the pool says the files are there, a tool that
would send them (aka zfs send) but ignoring errors to spacemaps etc would be
real useful (to me.)
> 
> An interactive tool may be useful (e.g. "I saw data structure version
1, 2, 3 available, and all with bad checksum, choose which one you would want to
try"), but I think they wouldn't be very practical for use with large
data pools -- unlike traditional filesystems, ZFS uses copy-on-write and heavily
depends on the metadata to find where the data is, and a regular
"scan" is not really useful.
Zdb -AAA showed (shows) 36m files..  which suggests the data is intact, but it
aborts the mount with I/o error because it says metadata has three errors.. 2
?metadata? and one ?<storage:0x0>? (storage being the pool name).. it does
import, and it attempts to resilver but reports the resilver finishes at some
780M (ish).. export import and it does it all again...  zdb without -AAA aborts
loading metaslab 122.
> 
> I'd agree that you need a full backup anyway, regardless what storage
system is used, though.
Yeah.. unlike UFS that has to get really really hosed to restore from backup
with nothing recoverable it seems ZFS can get hosed where issues occur in just
the wrong bit... but mostly it is recoverable (and my experience has been some
nasty shit that always ended up being recoverable.)

Michelle

Karl Denninger

2019-Apr-30 13:11 UTC

head link

ZFS...

On 4/30/2019 05:14, Michelle Sullivan wrote:>> On 30 Apr 2019, at 19:50, Xin LI <delphij at gmail.com> wrote:
>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle at
sorbs.net> wrote:
>>> but in my recent experience 2 issues colliding at the same time
results in disaster
>> Do we know exactly what kind of corruption happen to your pool?  If you
see it twice in a row, it might suggest a software bug that should be
investigated.
>>
>> All I know is it?s a checksum error on a meta slab (122) and from what
I can gather it?s the spacemap that is corrupt... but I am no expert.  I don?t
believe it?s a software fault as such, because this was cause by a hard outage
(damaged UPSes) whilst resilvering a single (but completely failed) drive. 
...and after the first outage a second occurred (same as the first but more
damaging to the power hardware)... the host itself was not damaged nor were the
drives or controller.
.....>> Note that ZFS stores multiple copies of its essential metadata, and in
my experience with my old, consumer grade crappy hardware (non-ECC RAM, with
several faulty, single hard drive pool: bad enough to crash almost monthly and
damages my data from time to time),
> This was a top end consumer grade mb with non ecc ram that had been running
for 8+ years without fault (except for hard drive platter failures.). Uptime
would have been years if it wasn?t for patching.
Yuck.

I'm sorry, but that may well be what nailed you.

ECC is not just about the random cosmic ray.? It also saves your bacon
when there are power glitches.

Unfortunately however there is also cache memory on most modern hard
drives, most of the time (unless you explicitly shut it off) it's on for
write caching, and it'll nail you too.? Oh, and it's never, in my
experience, ECC.

In addition, however, and this is something I learned a LONG time ago
(think Z-80 processors!) is that as in so many very important things
"two is one and one is none."

In other words without a backup you WILL lose data eventually, and it
WILL be important.

Raidz2 is very nice, but as the name implies it you have two
redundancies.? If you take three errors, or if, God forbid, you *write*
a block that has a bad checksum in it because it got scrambled while in
RAM, you're dead if that happens in the wrong place.
> Yeah.. unlike UFS that has to get really really hosed to restore from
backup with nothing recoverable it seems ZFS can get hosed where issues occur in
just the wrong bit... but mostly it is recoverable (and my experience has been
some nasty shit that always ended up being recoverable.)
>
> Michelle 
Oh that is definitely NOT true.... again, from hard experience,
including (but not limited to) on FreeBSD.

My experience is that ZFS is materially more-resilient but there is no
such thing as "can never be corrupted by any set of events."? Backup
strategies for moderately large (e.g. many Terabytes) to very large
(e.g. Petabytes and beyond) get quite complex but they're also very
necessary.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190430/36ed2f06/attachment.bin>

freebsd stable - Apr 2019 - ZFS...

ZFS...

ZFS...