thr3ads.net - freebsd stable - ZFS... [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Michelle Sullivan

2019-Apr-30 13:30 UTC

ZFS...

Karl Denninger wrote:> On 4/30/2019 05:14, Michelle Sullivan wrote:
>>> On 30 Apr 2019, at 19:50, Xin LI <delphij at gmail.com>
wrote:
>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle
at sorbs.net> wrote:
>>>> but in my recent experience 2 issues colliding at the same time
results in disaster
>>> Do we know exactly what kind of corruption happen to your pool?  If
you see it twice in a row, it might suggest a software bug that should be
investigated.
>>>
>>> All I know is it?s a checksum error on a meta slab (122) and from
what I can gather it?s the spacemap that is corrupt... but I am no expert.  I
don?t believe it?s a software fault as such, because this was cause by a hard
outage (damaged UPSes) whilst resilvering a single (but completely failed)
drive.  ...and after the first outage a second occurred (same as the first but
more damaging to the power hardware)... the host itself was not damaged nor were
the drives or controller.
> .....
>>> Note that ZFS stores multiple copies of its essential metadata, and
in my experience with my old, consumer grade crappy hardware (non-ECC RAM, with
several faulty, single hard drive pool: bad enough to crash almost monthly and
damages my data from time to time),
>> This was a top end consumer grade mb with non ecc ram that had been
running for 8+ years without fault (except for hard drive platter failures.).
Uptime would have been years if it wasn?t for patching.
> Yuck.
>
> I'm sorry, but that may well be what nailed you.
>
> ECC is not just about the random cosmic ray.  It also saves your bacon
> when there are power glitches.
No. Sorry no.  If the data is only half to disk, ECC isn't going to save 
you at all... it's all about power on the drives to complete the
write.>
> Unfortunately however there is also cache memory on most modern hard
> drives, most of the time (unless you explicitly shut it off) it's on
for
> write caching, and it'll nail you too.  Oh, and it's never, in my
> experience, ECC.
No comment on that - you're right in the first part, I can't comment if 
there are drives with ECC.
>
> In addition, however, and this is something I learned a LONG time ago
> (think Z-80 processors!) is that as in so many very important things
> "two is one and one is none."
>
> In other words without a backup you WILL lose data eventually, and it
> WILL be important.
>
> Raidz2 is very nice, but as the name implies it you have two
> redundancies.  If you take three errors, or if, God forbid, you *write*
> a block that has a bad checksum in it because it got scrambled while in
> RAM, you're dead if that happens in the wrong place.
Or in my case you write part data therefore invalidating the
checksum...>
>> Yeah.. unlike UFS that has to get really really hosed to restore from
backup with nothing recoverable it seems ZFS can get hosed where issues occur in
just the wrong bit... but mostly it is recoverable (and my experience has been
some nasty shit that always ended up being recoverable.)
>>
>> Michelle
> Oh that is definitely NOT true.... again, from hard experience,
> including (but not limited to) on FreeBSD.
>
> My experience is that ZFS is materially more-resilient but there is no
> such thing as "can never be corrupted by any set of events."
The latter part is true - and my blog and my current situation is not 
limited to or aimed at FreeBSD specifically,  FreeBSD is my experience.  
The former part... it has been very resilient, but I think (based on 
this certain set of events) it is easily corruptible and I have just 
been lucky.  You just have to hit a certain write to activate the issue, 
and whilst that write and issue might be very very difficult (read: hit 
and miss) to hit in normal every day scenarios it can and will 
eventually happen.
>    Backup
> strategies for moderately large (e.g. many Terabytes) to very large
> (e.g. Petabytes and beyond) get quite complex but they're also very
> necessary.
>and there in lies the problem.  If you don't have a many 10's of 
thousands of dollars backup solutions, you're either:

1/ down for a looooong time.
2/ losing all data and starting again...

..and that's the problem... ufs you can recover most (in most 
situations) and providing the *data* is there uncorrupted by the fault 
you can get it all off with various tools even if it is a complete 
mess....  here I am with the data that is apparently ok, but the 
metadata is corrupt (and note: as I had stopped writing to the drive 
when it started resilvering the data - all of it - should be intact... 
even if a mess.)

Michelle

-- 
Michelle Sullivan
http://www.mhix.org/

Alan Somers

2019-Apr-30 14:01 UTC

head link

ZFS...

On Tue, Apr 30, 2019 at 7:30 AM Michelle Sullivan <michelle at sorbs.net>
wrote:>
> Karl Denninger wrote:
> > On 4/30/2019 05:14, Michelle Sullivan wrote:
> >>> On 30 Apr 2019, at 19:50, Xin LI <delphij at gmail.com>
wrote:
> >>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan
<michelle at sorbs.net> wrote:
> >>>> but in my recent experience 2 issues colliding at the same
time results in disaster
> >>> Do we know exactly what kind of corruption happen to your
pool?  If you see it twice in a row, it might suggest a software bug that should
be investigated.
> >>>
> >>> All I know is it?s a checksum error on a meta slab (122) and
from what I can gather it?s the spacemap that is corrupt... but I am no expert. 
I don?t believe it?s a software fault as such, because this was cause by a hard
outage (damaged UPSes) whilst resilvering a single (but completely failed)
drive.  ...and after the first outage a second occurred (same as the first but
more damaging to the power hardware)... the host itself was not damaged nor were
the drives or controller.
> > .....
> >>> Note that ZFS stores multiple copies of its essential
metadata, and in my experience with my old, consumer grade crappy hardware
(non-ECC RAM, with several faulty, single hard drive pool: bad enough to crash
almost monthly and damages my data from time to time),
> >> This was a top end consumer grade mb with non ecc ram that had
been running for 8+ years without fault (except for hard drive platter
failures.). Uptime would have been years if it wasn?t for patching.
> > Yuck.
> >
> > I'm sorry, but that may well be what nailed you.
> >
> > ECC is not just about the random cosmic ray.  It also saves your bacon
> > when there are power glitches.
>
> No. Sorry no.  If the data is only half to disk, ECC isn't going to
save
> you at all... it's all about power on the drives to complete the write.
ECC RAM isn't about saving the last few seconds' worth of data from
before a power crash.  It's about not corrupting the data that gets
written long before a crash.  If you have non-ECC RAM, then a cosmic
ray/alpha ray/row hammer attack/bad luck can corrupt data after it's
been checksummed but before it gets DMAed to disk.  Then disk will
contain corrupt data and you won't know it until you try to read it
back.

-Alan
> >
> > Unfortunately however there is also cache memory on most modern hard
> > drives, most of the time (unless you explicitly shut it off) it's
on for
> > write caching, and it'll nail you too.  Oh, and it's never, in
my
> > experience, ECC.
Fortunately, ZFS never sends non-checksummed data to the hard drive.
So an error in the hard drive's cache ram will usually get detected by
the ZFS checksum.
>
> No comment on that - you're right in the first part, I can't
comment if
> there are drives with ECC.
>
> >
> > In addition, however, and this is something I learned a LONG time ago
> > (think Z-80 processors!) is that as in so many very important things
> > "two is one and one is none."
> >
> > In other words without a backup you WILL lose data eventually, and it
> > WILL be important.
> >
> > Raidz2 is very nice, but as the name implies it you have two
> > redundancies.  If you take three errors, or if, God forbid, you
*write*
> > a block that has a bad checksum in it because it got scrambled while
in
> > RAM, you're dead if that happens in the wrong place.
>
> Or in my case you write part data therefore invalidating the checksum...
> >
> >> Yeah.. unlike UFS that has to get really really hosed to restore
from backup with nothing recoverable it seems ZFS can get hosed where issues
occur in just the wrong bit... but mostly it is recoverable (and my experience
has been some nasty shit that always ended up being recoverable.)
> >>
> >> Michelle
> > Oh that is definitely NOT true.... again, from hard experience,
> > including (but not limited to) on FreeBSD.
> >
> > My experience is that ZFS is materially more-resilient but there is no
> > such thing as "can never be corrupted by any set of events."
>
> The latter part is true - and my blog and my current situation is not
> limited to or aimed at FreeBSD specifically,  FreeBSD is my experience.
> The former part... it has been very resilient, but I think (based on
> this certain set of events) it is easily corruptible and I have just
> been lucky.  You just have to hit a certain write to activate the issue,
> and whilst that write and issue might be very very difficult (read: hit
> and miss) to hit in normal every day scenarios it can and will
> eventually happen.
>
> >    Backup
> > strategies for moderately large (e.g. many Terabytes) to very large
> > (e.g. Petabytes and beyond) get quite complex but they're also
very
> > necessary.
> >
> and there in lies the problem.  If you don't have a many 10's of
> thousands of dollars backup solutions, you're either:
>
> 1/ down for a looooong time.
> 2/ losing all data and starting again...
>
> ..and that's the problem... ufs you can recover most (in most
> situations) and providing the *data* is there uncorrupted by the fault
> you can get it all off with various tools even if it is a complete
> mess....  here I am with the data that is apparently ok, but the
> metadata is corrupt (and note: as I had stopped writing to the drive
> when it started resilvering the data - all of it - should be intact...
> even if a mess.)
>
> Michelle
>
> --
> Michelle Sullivan
> http://www.mhix.org/
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"

Borja Marcos

2019-Apr-30 15:37 UTC

head link

ZFS...

> On 30 Apr 2019, at 15:30, Michelle Sullivan <michelle at sorbs.net>
wrote:
> 
>> I'm sorry, but that may well be what nailed you.
>> 
>> ECC is not just about the random cosmic ray.  It also saves your bacon
>> when there are power glitches.
> 
> No. Sorry no.  If the data is only half to disk, ECC isn't going to
save you at all... it's all about power on the drives to complete the write.
Not necessarily. Depending on the power outage things can get really funny
during the power loss event. 25+ years ago I witnessed
a severe 2 second voltage drop and during that time the hard disk in our SCO
Unix server got really crazy. Even the low level format
was corrupted, damage was way beyond mere filesystem corruption.

During the start of a power outage (especially when it?s not a clean power cut,
but it comes preceded by some voltage swings) data
corruption can be extensive. As far as I know high end systems include power
management elements to reduce the impact.

I have other war stories about UPS systems providing an extremely dirty waveform
and causing format problems in disks. That happened
in 1995 or so.

>> 
>> Unfortunately however there is also cache memory on most modern hard
>> drives, most of the time (unless you explicitly shut it off) it's
on for
>> write caching, and it'll nail you too.  Oh, and it's never, in
my
>> experience, ECC.
> 
> No comment on that - you're right in the first part, I can't
comment if there are drives with ECC.
Even with cache corruption, ZFS being transaction oriented should offer a
reasonable guarantee of integrity. You may lose
1 miunte, 5 minutes of changes, but there should be stable, committed data on
the disk.

Unless the electronics got insane for some milliseconds during the outage event
(see above).
>> Oh that is definitely NOT true.... again, from hard experience,
>> including (but not limited to) on FreeBSD.
>> 
>> My experience is that ZFS is materially more-resilient but there is no
>> such thing as "can never be corrupted by any set of events."
> 
> The latter part is true - and my blog and my current situation is not
limited to or aimed at FreeBSD specifically,  FreeBSD is my experience.  The
former part... it has been very resilient, but I think (based on this certain
set of events) it is easily corruptible and I have just been lucky.  You just
have to hit a certain write to activate the issue, and whilst that write and
issue might be very very difficult (read: hit and miss) to hit in normal every
day scenarios it can and will eventually happen.
> 
>>   Backup
>> strategies for moderately large (e.g. many Terabytes) to very large
>> (e.g. Petabytes and beyond) get quite complex but they're also very
>> necessary.
>> 
> and there in lies the problem.  If you don't have a many 10's of
thousands of dollars backup solutions, you're either:
> 
> 1/ down for a looooong time.
> 2/ losing all data and starting again...
> 
> ..and that's the problem... ufs you can recover most (in most
situations) and providing the *data* is there uncorrupted by the fault you can
get it all off with various tools even if it is a complete mess....  here I am
with the data that is apparently ok, but the metadata is corrupt (and note: as I
had stopped writing to the drive when it started resilvering the data - all of
it - should be intact... even if a mess.)
The advantage of ZFS is that it makes it feasible to replicate data. If you keep
a mirror storage server your disaster recovery actions won?t require the
recovery of a full backup (which can take an inordinate amount of time) but
reconfiguring the replica server to assume the role of the master one.

Again, being transaction based somewhat reduces the likelyhood of a software bug
on the master to propagate to the slave causing extensive corruption. Rewinding
to
the previous snapshot should help.






Borja.

Walter Cramer

2019-Apr-30 16:07 UTC

head link

ZFS...

Brief "Old Man" summary/perspective here...

Computers and hard drives are complex, sensitive physical things.  They, 
or the data on them, can be lost to fire, flood, lightning strikes, theft, 
transportation screw-ups, and more.  Mass data corruption by faulty 
hardware or software is mostly rare, but does happen.  Then there's the 
users - authorized or not - who are inept or malicious.

You can spent a fortune to make loss of the "live" data in your home 
server / server room / data center very unlikely.  Is that worth the time 
and money?  Depends on the business case.  At any scale, it's best to have 
a manager - who understands both computers and the bottom line - keep a 
close eye on this.

"Real" protection from data loss means multiple off-site and generally
off-line backups.  You could spend a fortune on that, too...but for your 
use case (~21TB in an array that could hold ~39TB, and what sounds like a 
"home power user" budget), I'd say to put together two
"backup servers" -
cheap little (aka transportable) FreeBSD systems with, say 7x6GB HD's, 
raidz1.  With even a 1Gbit ethernet connection to your main system, savvy 
use of (say) rsync (net/rsync in Ports), and the sort of "know your data / 
divide & conquer" tactics that Karl mentions, you should be able to 
complete initial backups (on both backup servers) in <1 month.  After that 
- rsync can generally do incremental backups far, far faster.  How often 
you gently haul the backup servers to/from your off-site location(s) 
depends on a bunch of factors - backup frequency, cost of bandwidth, etc.

Never skimp on power supplies.

-Walter

[Credits:  Nothing above is original.  Others have already made most of my 
points in this thread.  It's pretty much all decades-old computer wisdom 
in any case.]


On Tue, 30 Apr 2019, Michelle Sullivan wrote:
> Karl Denninger wrote:
>> On 4/30/2019 05:14, Michelle Sullivan wrote:
>>>> On 30 Apr 2019, at 19:50, Xin LI <delphij at gmail.com>
wrote:
>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan
<michelle at sorbs.net>
> wrote:
>>>>> but in my recent experience 2 issues colliding at the same
time results
> in disaster
>>>> Do we know exactly what kind of corruption happen to your pool?
If you
> see it twice in a row, it might suggest a software bug that should be 
> investigated.
>>>>
>>>> All I know is it???s a checksum error on a meta slab (122) and
from what
> I can gather it???s the spacemap that is corrupt... but I am no expert.  I 
> don???t believe it???s a software fault as such, because this was cause by
a
> hard outage (damaged UPSes) whilst resilvering a single (but completely 
> failed) drive.  ...and after the first outage a second occurred (same as
the
> first but more damaging to the power hardware)... the host itself was not 
> damaged nor were the drives or controller.
>> .....
>>>> Note that ZFS stores multiple copies of its essential metadata,
and in my
> experience with my old, consumer grade crappy hardware (non-ECC RAM, with 
> several faulty, single hard drive pool: bad enough to crash almost monthly 
> and damages my data from time to time),
>>> This was a top end consumer grade mb with non ecc ram that had been
> running for 8+ years without fault (except for hard drive platter
failures.).
> Uptime would have been years if it wasn???t for patching.
>> Yuck.
>>
>> I'm sorry, but that may well be what nailed you.
>>
>> ECC is not just about the random cosmic ray.  It also saves your bacon
>> when there are power glitches.
>
> No. Sorry no.  If the data is only half to disk, ECC isn't going to
save
> you at all... it's all about power on the drives to complete the write.
>>
>> Unfortunately however there is also cache memory on most modern hard
>> drives, most of the time (unless you explicitly shut it off) it's
on for
>> write caching, and it'll nail you too.  Oh, and it's never, in
my
>> experience, ECC.
>
> No comment on that - you're right in the first part, I can't
comment if
> there are drives with ECC.
>
>>
>> In addition, however, and this is something I learned a LONG time ago
>> (think Z-80 processors!) is that as in so many very important things
>> "two is one and one is none."
>>
>> In other words without a backup you WILL lose data eventually, and it
>> WILL be important.
>>
>> Raidz2 is very nice, but as the name implies it you have two
>> redundancies.  If you take three errors, or if, God forbid, you *write*
>> a block that has a bad checksum in it because it got scrambled while in
>> RAM, you're dead if that happens in the wrong place.
>
> Or in my case you write part data therefore invalidating the checksum...
>>
>>> Yeah.. unlike UFS that has to get really really hosed to restore
from
> backup with nothing recoverable it seems ZFS can get hosed where issues
occur
> in just the wrong bit... but mostly it is recoverable (and my experience
has
> been some nasty shit that always ended up being recoverable.)
>>>
>>> Michelle
>> Oh that is definitely NOT true.... again, from hard experience,
>> including (but not limited to) on FreeBSD.
>>
>> My experience is that ZFS is materially more-resilient but there is no
>> such thing as "can never be corrupted by any set of events."
>
> The latter part is true - and my blog and my current situation is not 
> limited to or aimed at FreeBSD specifically,  FreeBSD is my experience. 
> The former part... it has been very resilient, but I think (based on 
> this certain set of events) it is easily corruptible and I have just 
> been lucky.  You just have to hit a certain write to activate the issue, 
> and whilst that write and issue might be very very difficult (read: hit 
> and miss) to hit in normal every day scenarios it can and will 
> eventually happen.
>
>>    Backup
>> strategies for moderately large (e.g. many Terabytes) to very large
>> (e.g. Petabytes and beyond) get quite complex but they're also very
>> necessary.
>>
> and there in lies the problem.  If you don't have a many 10's of 
> thousands of dollars backup solutions, you're either:
>
> 1/ down for a looooong time.
> 2/ losing all data and starting again...
>
> ..and that's the problem... ufs you can recover most (in most 
> situations) and providing the *data* is there uncorrupted by the fault 
> you can get it all off with various tools even if it is a complete 
> mess....  here I am with the data that is apparently ok, but the 
> metadata is corrupt (and note: as I had stopped writing to the drive 
> when it started resilvering the data - all of it - should be intact... 
> even if a mess.)
>
> Michelle
>
> -- 
> Michelle Sullivan
> http://www.mhix.org/
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>

Patrick M. Hausen

2019-May-06 09:27 UTC

head link

ZFS...

Hi!
> Am 30.04.2019 um 18:07 schrieb Walter Cramer <wfc at mintsol.com>:
> With even a 1Gbit ethernet connection to your main system, savvy use of
(say) rsync (net/rsync in Ports), and the sort of "know your data / divide
& conquer" tactics that Karl mentions, you should be able to complete
initial backups (on both backup servers) in <1 month.  After that - rsync can
generally do incremental backups far, far faster.
ZFS can do incremental snapshots and send/receive much faster than rsync
on the file level. And e.g. FreeNAS comes with all the bells and whistles
already
in place - just a matter of point and click to replicate one set of datasets on
one
server to another one ?

*Local* replication is a piece of cake today, if you have the hardware.

Kind regards,
Patrick
-- 
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info at punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling

freebsd stable - Apr 2019 - ZFS...

ZFS...

ZFS...

ZFS...

ZFS...

ZFS...