Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
On 09 May 2019, at 01:55, Walter Parker <walterp at gmail.com> wrote:
>>
>>
>> ZDB (unless I'm misreading it) is able to find all 34m+ files and
>> verifies the checksums. The problem is in the zfs data structures (one
>> definitely, two maybe, metaslabs fail checksums preventing the mounting
>> (even read-only) of the volumes.)
>>
>>> Especially, how to you know
>>> before you recovered the data from the drive.
>> See above.
>>
>>> As ZFS meta data is stored
>>> redundantly on the drive and never in an inconsistent form (that is
what
>>> fsck does, it fixes the inconsistent data that most other
filesystems
>> store
>>> when they crash/have disk issues).
>> The problem - unless I'm reading zdb incorrectly - is limited to
the
>> structure rather than the data. This fits with the fact the drive was
>> isolated from user changes when the drive was being resilvered so the
>> data itself was not being altered .. that said, I am no expert so I
>> could easily be completely wrong.
>>
>> What it sounds like you need is a meta data fixer, not a file recovery
> tool.
This is true, but I am of the thought in alignment with the zFs devs this might
not be a good idea... if zfs can?t work it out already, the best thing to do
will probably be get everything off it and reformat...
> Assuming the meta data can be fixed that would be the easy route.
That?s the thing... I don?t know if it can be easily fixed... more I think the
meta data can probably be easily fixed, but I suspect the spacemap can?t and as
such if it can?t there is going to be one of two things... either a big hole
(or multiple little ones) or the likelihood of new data overwriting partially or
in full, old data and this would not be good..
> That sound not be hard to write if everything else on the disk has no
> issues. Don't you say in another message that the system is now
returning
> 100's of drive errors.
No, one disk in the 16 disk zRAID2 ... previously unseen but it could be the
errors have occurred in the last 6 weeks... everytime I reboot it started
resilvering, gets to 761M resilvered and then stops.
> How does that relate the statement =>Everything on
> the disk is fine except for a little bit of corruption in the freespace
map?
Well I think it goes through until it hits that little bit of corruption at
stops it mounting... then stops again..
I?m seeing 100s of hard errors at the beginning of one of the drives.. they were
reported in syslog but only just so could be a new thing. Could be previously
undetected.. no way to know.
>
>
>>
>>>
>>> I have a friend/business partner that doesn't want to move to
ZFS because
>>> his recovery method is wait for a single drive (no-redundancy,
sometimes
>> no
>>> backup) to fail and then use ddrescue to image the broken drive to
a new
>>> drive (ignoring any file corruption because you can't really
tell without
>>> ZFS). He's been using disk rescue programs for so long that he
will not
>>> move to ZFS, because it doesn't have a disk rescue program.
>>
>> The first part is rather cavilier .. the second part I kinda
>> understand... its why I'm now looking at alternatives ...
particularly
>> being bitten as badly as I have with an unmountable volume.
>>
>> On the system I managed for him, we had a system with ZFS crap out. I
> restored it from a backup. I continue to believe that people running
> systems without backups are living on borrowed time. The idea of relying on
> a disk recovery tool is too risky for my taste.
>
>
>>> He has systems
>>> on Linux with ext3 and no mirroring or backups. I've asked
about moving
>>> them to a mirrored ZFS system and he has told me that the customer
>> doesn't
>>> want to pay for a second drive (but will pay for hours of his time
to fix
>>> the problem when it happens). You kind of sound like him.
>> Yeah..no! I'd be having that on a second (mirrored) drive... like
most
>> of my production servers.
>>
>>> ZFS is risky
>>> because there isn't a good drive rescue program.
>> ZFS is good for some applications. ZFS is good to prevent cosmic ray
>> issues. ZFS is not good when things go wrong. ZFS doesn't usually
go
>> wrong. Think that about sums it up.
>>
>> When it does go wrong I restore from backups. Therefore my systems
don't
> have problems. I sorry you had the perfect trifecta that caused you to lose
> multiple drives and all your backups at the same time.
>
>
>>> Sun's design was that the
>>> system should be redundant by default and checksum everything. If
the
>>> drives fail, replace them. If they fail too much or too fast,
restore
>> from
>>> backup. Once the system had too much corruption, you can't
recover/check
>>> for all the damage without a second off disk copy. If you have that
off
>>> disk, then you have backup. They didn't build for the standard
use case
>> as
>>> found in PCs because the disk recover programs rarely get
everything
>> back,
>>> therefore they can't be relied on to get you data back when
your data is
>>> important. Many PC owners have brought PC mindset ideas to the
"UNIX"
>>> world. Sun's history predates Windows and Mac and comes from a
>>> Mini/Mainframe mindset (were people tried not to guess about data
>>> integrity).
>> I came from the days of Sun.
>>
>> Good then you should understand Sun's point of view.
>
>
>>>
>>> Would a disk rescue program for ZFS be a good idea? Sure. Should
the lack
>>> of a disk recovery program stop you from using ZFS? No. If you
think so,
>> I
>>> suggest that you have your data integrity priorities in the wrong
order
>>> (focusing on small, rare events rather than the common base case).
>> Common case in your assessment in the email would suggest backups are
>> not needed unless you have a rare event of a multi-drive failure.
Which
>> I know you're not advocating, but it is this same circular
argument...
>> ZFS is so good it's never wrong we don't need no stinking
recovery
>> tools, oh but take backups if it does fail, but it won't because
it's so
>> good and you have to be running consumer hardware or doing something
>> wrong or be very unlucky with failures... etc.. round and round we go,
>> where ever she'll stop no-one knows.
>>
>> I advocate 2-3 backups of any important system (at least one different
> that the other, offsite if one can afford it).
> I never said ZFS is so good we don't need backups (that would be a
stupid
> comment). As far as a recovery tool, those sound risky. I'd prefer
> something without so much risk.
>
> Make your own judgement, it is your time and data. I think ZFS is a great
> filesystem that anyone using FreeBSD or Illumios should be using.
>
>
> --
> The greatest dangers to liberty lurk in insidious encroachment by men of
> zeal, well-meaning but without understanding. -- Justice Louis D.
Brandeis
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"