thr3ads.net - freebsd stable - ZFS... [May 2019]

If this information is useful, please help other people find it:
Share via:

Michelle Sullivan

2019-May-01 14:53 UTC

ZFS...

Paul Mather wrote:> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan <michelle at
sorbs.net>
> wrote:
>
>> Been there done that though with ext2 rather than UFS..  still got 
>> all my data back... even though it was a nightmare..
>
>
> Is that an implication that had all your data been on UFS (or ext2:) 
> this time around you would have got it all back?  (I've got that 
> impression through this thread from things you've written.) That sort 
> of makes it sound like UFS is bulletproof to me.
Its definitely not (and far from it) bullet proof - however when the 
data on disk is not corrupt I have managed to recover it - even if it 
has been a nightmare - no structure - all files in lost+found etc... or 
even resorting to r-studio in the even of lost raid information
etc..>
> There are levels of corruption.  Maybe what you suffered would have 
> taken down UFS, too? 
Pretty sure not - and even if it would have - with the files intact I 
have always been able to recover them... r-studio being the last resort.
> I guess there's no way to know unless there's some way you can 
> recreate exactly the circumstances that took down your original system 
> (but this time your data on UFS). ;-)
True.

This case - from what my limited knowledge has managed to fathom is a 
spacemap has become corrupt due to partial write during the hard power 
failure. This was the second hard outage during the resilver process 
following a drive platter failure (on a ZRAID2 - so single platter 
failure should be completely recoverable all cases - except hba failure 
or other corruption which does not appear to be the case).. the spacemap 
fails checksum (no surprises there being that it was part written) 
however it cannot be repaired (for what ever reason))... how I get that 
this is an interesting case... one cannot just assume anything about the 
corrupt spacemap... it could be complete and just the checksum is wrong, 
it could be completely corrupt and ignorable.. but what I understand of 
ZFS (and please watchers chime in if I'm wrong) the spacemap is just the 
freespace map.. if corrupt or missing one cannot just 'fix it' because 
there is a very good chance that the fix would corrupt something that is 
actually allocated and therefore the best solution would be (to "fix 
it") would be consider it 100% full and therefore 'dead space' ..
but
zfs doesn't do that - probably a good thing - the result being that a 
drive that is supposed to be good (and zdb reports some +36m objects 
there) becomes completely unreadable ...  my thought (desire/want) on a 
'walk' tool would be a last resort tool that could walk the datasets and
send them elsewhere (like zfs send) so that I could create a new pool 
elsewhere and send the data it knows about to another pool and then blow 
away the original - if there are corruptions or data missing, thats my 
problem it's a last resort.. but in the case the critical structures 
become corrupt it means a local recovery option is enabled.. it means 
that if the data is all there and the corruption is just a spacemap one 
can transfer the entire drive/data to a new pool whilst the original 
host is rebuilt... this would *significantly* help most people with 
large pools that have to blow them away and re-create the pools because 
of errors/corruptions etc... and with the addition of 'rsync' (the 
checksumming of files) it would be trivial to just 'fix' the data 
corrupted or missing from a mirror host rather than transferring the 
entire pool from (possibly) offsite....

Regards,

-- 
Michelle Sullivan
http://www.mhix.org/

Steven Hartland

2019-May-01 17:39 UTC

head link

ZFS...

On 01/05/2019 15:53, Michelle Sullivan wrote:> Paul Mather wrote:
>> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan <michelle at
sorbs.net>
>> wrote:
>>
>>> Been there done that though with ext2 rather than UFS..? still got 
>>> all my data back... even though it was a nightmare..
>>
>>
>> Is that an implication that had all your data been on UFS (or ext2:) 
>> this time around you would have got it all back?? (I've got that 
>> impression through this thread from things you've written.) That
sort
>> of makes it sound like UFS is bulletproof to me.
>
> Its definitely not (and far from it) bullet proof - however when the 
> data on disk is not corrupt I have managed to recover it - even if it 
> has been a nightmare - no structure - all files in lost+found etc... 
> or even resorting to r-studio in the even of lost raid information etc..Yes but you seem to have done this with ZFS too, just not in this 
particularly bad case.

If you imagine that the in memory update for the metadata was corrupted 
and then written out to disk, which is what you seem to have experienced 
with your ZFS pool, then you'd be in much the same
position.>
> This case - from what my limited knowledge has managed to fathom is a 
> spacemap has become corrupt due to partial write during the hard power 
> failure. This was the second hard outage during the resilver process 
> following a drive platter failure (on a ZRAID2 - so single platter 
> failure should be completely recoverable all cases - except hba 
> failure or other corruption which does not appear to be the case).. 
> the spacemap fails checksum (no surprises there being that it was part 
> written) however it cannot be repaired (for what ever reason))... how 
> I get that this is an interesting case... one cannot just assume 
> anything about the corrupt spacemap... it could be complete and just 
> the checksum is wrong, it could be completely corrupt and ignorable.. 
> but what I understand of ZFS (and please watchers chime in if I'm 
> wrong) the spacemap is just the freespace map.. if corrupt or missing 
> one cannot just 'fix it' because there is a very good chance that
the
> fix would corrupt something that is actually allocated and therefore 
> the best solution would be (to "fix it") would be consider it
100%
> full and therefore 'dead space' .. but zfs doesn't do that -
probably
> a good thing - the result being that a drive that is supposed to be 
> good (and zdb reports some +36m objects there) becomes completely 
> unreadable ...? my thought (desire/want) on a 'walk' tool would be
a
> last resort tool that could walk the datasets and send them elsewhere 
> (like zfs send) so that I could create a new pool elsewhere and send 
> the data it knows about to another pool and then blow away the 
> original - if there are corruptions or data missing, thats my problem 
> it's a last resort.. but in the case the critical structures become 
> corrupt it means a local recovery option is enabled.. it means that if 
> the data is all there and the corruption is just a spacemap one can 
> transfer the entire drive/data to a new pool whilst the original host 
> is rebuilt... this would *significantly* help most people with large 
> pools that have to blow them away and re-create the pools because of 
> errors/corruptions etc... and with the addition of 'rsync' (the 
> checksumming of files) it would be trivial to just 'fix' the data 
> corrupted or missing from a mirror host rather than transferring the 
> entire pool from (possibly) offsite....
 From what I've read that's not a partial write issue, as in that case 
the pool would have just rolled back. It sounds more like the write was 
successful but the data in that write was trashed due to your power 
incident and that was replicated across ALL drives.

To be clear this may or may not be what your seeing as you don't see to 
have covered any of the details of the issues your seeing and what in 
detail steps you have tried to recover with?

I'm not saying this is the case but all may not be lost depending on the 
exact nature of the corruption.

For more information on space maps see:
https://www.delphix.com/blog/delphix-engineering/openzfs-code-walk-metaslabs-and-space-maps
https://sdimitro.github.io/post/zfs-lsm-flushing/

A similar behavior resulted in being a bug:
https://www.reddit.com/r/zfs/comments/97czae/zfs_zdb_space_map_errors_on_unmountable_zpool/

 ??? Regards
 ??? Steve

Michelle Sullivan

2019-May-01 23:46 UTC

head link

ZFS...

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
> On 02 May 2019, at 03:39, Steven Hartland <killing at
multiplay.co.uk> wrote:
> 
> 
> 
>> On 01/05/2019 15:53, Michelle Sullivan wrote:
>> Paul Mather wrote:
>>>> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan <michelle at
sorbs.net> wrote:
>>>> 
>>>> Been there done that though with ext2 rather than UFS..  still
got all my data back... even though it was a nightmare..
>>> 
>>> 
>>> Is that an implication that had all your data been on UFS (or
ext2:) this time around you would have got it all back?  (I've got that
impression through this thread from things you've written.) That sort of
makes it sound like UFS is bulletproof to me.
>> 
>> Its definitely not (and far from it) bullet proof - however when the
data on disk is not corrupt I have managed to recover it - even if it has been a
nightmare - no structure - all files in lost+found etc... or even resorting to
r-studio in the even of lost raid information etc..
> Yes but you seem to have done this with ZFS too, just not in this
particularly bad case.
> 
There is no r-studio for zfs or I would have turned to it as soon as this issue
hit.

> If you imagine that the in memory update for the metadata was corrupted and
then written out to disk, which is what you seem to have experienced with your
ZFS pool, then you'd be in much the same position.
>> 
>> This case - from what my limited knowledge has managed to fathom is a
spacemap has become corrupt due to partial write during the hard power failure.
This was the second hard outage during the resilver process following a drive
platter failure (on a ZRAID2 - so single platter failure should be completely
recoverable all cases - except hba failure or other corruption which does not
appear to be the case).. the spacemap fails checksum (no surprises there being
that it was part written) however it cannot be repaired (for what ever
reason))... how I get that this is an interesting case... one cannot just assume
anything about the corrupt spacemap... it could be complete and just the
checksum is wrong, it could be completely corrupt and ignorable.. but what I
understand of ZFS (and please watchers chime in if I'm wrong) the spacemap
is just the freespace map.. if corrupt or missing one cannot just 'fix
it' because there is a very good chance that the fix would corrupt something
that is actually allocated and therefore the best solution would be (to
"fix it") would be consider it 100% full and therefore 'dead
space' .. but zfs doesn't do that - probably a good thing - the result
being that a drive that is supposed to be good (and zdb reports some +36m
objects there) becomes completely unreadable ...  my thought (desire/want) on a
'walk' tool would be a last resort tool that could walk the datasets and
send them elsewhere (like zfs send) so that I could create a new pool elsewhere
and send the data it knows about to another pool and then blow away the original
- if there are corruptions or data missing, thats my problem it's a last
resort.. but in the case the critical structures become corrupt it means a local
recovery option is enabled.. it means that if the data is all there and the
corruption is just a spacemap one can transfer the entire drive/data to a new
pool whilst the original host is rebuilt... this would *significantly* help most
people with large pools that have to blow them away and re-create the pools
because of errors/corruptions etc... and with the addition of 'rsync'
(the checksumming of files) it would be trivial to just 'fix' the data
corrupted or missing from a mirror host rather than transferring the entire pool
from (possibly) offsite....
> 
> From what I've read that's not a partial write issue, as in that
case the pool would have just rolled back. It sounds more like the write was
successful but the data in that write was trashed due to your power incident and
that was replicated across ALL drives.
> 
I think this might be where the problem started.. it was already rolling back
from the first power issue (it did exactly what was expected and programmed, it
rolled back 5 seconds.. which as no-one had write access to it from the start of
the resilver I really didn?t care as the only changes were the resilver
itself.). Now you assertion/musing maybe correct...  all drives got trashed
data.. I think not but unless we get into it and examine it I think we won?t
know.  What I do know is in the second round -FfX wouldn?t work, I used zdb to
locate a ?LOADED? MOS and used -t <txg> to import.. the txg number was 7
or 8 from current so just outside of the -X limit (going off memory here, so
could have been more, but I remember it was just past the switch limit.)
> To be clear this may or may not be what your seeing as you don't see to
have covered any of the details of the issues your seeing and what in detail
steps you have tried to recover with?
There have been many steps over the last month.. and some of which I may have
made it from very difficult to recover to non recoverable now... though the only
writes is what the kernel does as have not got it (the dataset) mounted at any
time, even though it has imported.
> 
> I'm not saying this is the case but all may not be lost depending on
the exact nature of the corruption.
> 
> For more information on space maps see:
>
https://www.delphix.com/blog/delphix-engineering/openzfs-code-walk-metaslabs-and-space-maps
This is something I read a month ago, along with multiple other articles on the
same blog, including https://www.delphix.com/blog/openzfs-pool-import-recovery

Which I might add got me from non importable to importable but not mountable.

I have *not* attempted to bypass the checksum line for spacemap load to date as
I see that as a possible way to make the problem worse.
> https://sdimitro.github.io/post/zfs-lsm-flushing/
Not read this.
> 
> A similar behavior resulted in being a bug:
>
https://www.reddit.com/r/zfs/comments/97czae/zfs_zdb_space_map_errors_on_unmountable_zpool/
> 
Or this.. will go there following ?pressing send?.. :)
>     Regards
>     Steve
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"

Michelle Sullivan

2019-May-01 23:52 UTC

head link

ZFS...

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
> On 02 May 2019, at 09:46, Michelle Sullivan <michelle at sorbs.net>
wrote:
> 
> What I do know is in the second round -FfX wouldn?t work,
*after the second round

freebsd stable - May 2019 - ZFS...

ZFS...

ZFS...

ZFS...

ZFS...