thr3ads.net - zfs discuss - [zfs-discuss] ZFS + fsck [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Kevin Walker

2009-Nov-04 14:03 UTC

[zfs-discuss] ZFS + fsck

Hi all,

Just subscribed to the list after a debate on our helpdesk lead me to the
posting about  ZFS corruption and the need for a fsck repair tool of some
kind...

Has there been any update on this?



Kind regards,
?
Kevin Walker
Coreix Limited
?
DDI: (+44) 0207 183 1725 ext 90
Mobile: (+44) 07960 967818
Fax: (+44) 0208 53 44 111

*********************************************************************
This message is intended solely for the use of the individual or organisation to
whom it is addressed. It may contain privileged or confidential information. If
you are not the intended recipient, you should not use, copy, alter, or disclose
the contents of this message
*********************************************************************

Rob Warner

2009-Nov-04 15:05 UTC

head link

[zfs-discuss] ZFS + fsck

ZFS scrub will detect many types of error in your data or the filesystem
metadata.

If you have sufficient redundancy in your pool and the errors were not due to
dropped or misordered writes, then they can often be automatically corrected
during the scrub.

If ZFS detects an error from which it cannot automatically recover, it will
often instantly lock your entire pool to prevent any read or write access,
informing you only that you must destroy it and "restore from backups"
to get your data back.

Your only recourse in such situations is to do exactly that, or enlist the help
of Victor Latushkin to attempt to recover your pool using painstaking manual
manipulation.

Recent putbacks seem to indicate that future releases will provide a mechanism
to allow mere mortals to recover from some of the errors caused by dropped
writes.

cheers,

Rob
-- 
This message posted from opensolaris.org

Orvar Korvar

2009-Nov-04 22:44 UTC

head link

[zfs-discuss] ZFS + fsck

Such a functionality is in the ZFS code now. It will be available later for us
http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
-- 
This message posted from opensolaris.org

Orvar Korvar

2009-Nov-04 22:47 UTC

head link

[zfs-discuss] ZFS + fsck

Also, read this:
http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
-- 
This message posted from opensolaris.org

Craig S. Bell

2009-Nov-04 23:05 UTC

head link

[zfs-discuss] ZFS + fsck

Joerg just posted a lengthy answer to the fsck question:

http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html

Good stuff.  I see two answers to "nobody complained about lying hardware
before ZFS".

One:  The user has never tried another filesystem that tests for end-to-end data
integrity, so ZFS notices more problems, and sooner.

Two: If you lost data with another filesystem, you may have overlooked it and
blamed the OS or the application, instead of the inexpensive hardware.
-- 
This message posted from opensolaris.org

Robert Milkowski

2009-Nov-04 23:29 UTC

head link

[zfs-discuss] ZFS + fsck

Kevin Walker wrote:> Hi all,
>
> Just subscribed to the list after a debate on our helpdesk lead me to the
posting about  ZFS corruption and the need for a fsck repair tool of some
kind...
>
> Has there been any update on this?
>
>   
I guess the discussion started after someone read an article on OSNEWS.

The way zfs works is that basically you get an fsck equivalent while 
using a pool.
ZFS checks checksums for all metadata and user data while reading it. 
Then all metadata are using ditto blocks to provide 2 or three copies of 
it (totally independent from any pool redundancy) depends on type of 
metadata. If it is corrupted a second (or third) copy will be used so 
correct data is returned and a corrupted block is automatically 
repaired. The ability to repair a block containing a user data depends 
on if you have a pool configured with or without redundancy. But even if 
pool is non-redundant (lets say a single disk drive) zfs still will be 
able to detect corruption and will be able to tell you what files are 
affected while metadata will be correct in most cases (unless corruption 
is so large and not localized so it affected all copies of a block in a 
pool). You will be able to read all other files and other parts of the file.

So fsck actually happens while you are accessing your data and it is 
even better than fsck on most other filesystems as thanks to 
checksumming of all data and metadata zfs knows exactly when/if 
something is wrong and in most cases is even able to fix it on the fly. 
If you want to scan entire pool including all redundant copies and get 
them fix if something doesn''t checksum then you can schedule the pool 
scrubbing (while your applications are still using the pool!). This will 
force zfs to read all blocks from all copies to be read, their checksum 
checked and if needed data corrected if possible and the fact reported 
to user. Legacy fsck is not even close to it.

I think that the perceived need for fsck for ZFS probably comes from 
lack of understanding how ZFS works and from some frustrated users where 
under a very unlikely and rare circumstances due to data corruption a 
user might be in a position of not being able to import the pool 
therefore not being able to access any data at all while a corruption 
might have affected only a relatively small amount of data. Most other 
filesystem will allow you to access most of the data after fsck in such 
a situation (probably with some data loss) while zfs left user with no 
access to data at all. In such a case the problem lies with zfs 
uberblock and the remedy is to revert a pool to its previous uberblock 
version (or even an earlier one). In almost all the cases this will 
render a pool importable and then the mechanisms described in the first 
paragraph above will kick-in. The problem is (was) that the procedure to 
revert a pool to one of its previous uberblock is not documented nor is 
automatic and is definitely far from being sys-admin friendly. But 
thanks to some community members (most notably mr. Victor I think) some 
users affected by the issue were given a hand and were able to recover 
most/all their data. Others were probably assisted by Sun''s support 
service I guess.

Fortunately a much more user-friendly mechanism has been finally 
implemented and inegrated into Open Solaris build 126 which allows a 
user to import a pool and force it to on of the previous versions of its 
uberblock if necessary. See 
http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html 
for more details.

There is another CR (don''t have its number at hand) which is about 
implementing a delayed re-use on just freed blocks which should allow 
for more data to be recovered in such a case as above. Although I''m not
sure if it has been implemented yet.

IMHO with the above CR implemented, in most cases ZFS currently provides 
*much* better solution to random data corruption than any other 
filesystem+fsck in the market.

Personally I don''t blame Sun that implementing the CR took so long as
it
mostly affected home users with cheap hardware from BestBuy like sources 
and even then it was relatively rare. So called enterprise customers 
were affected even less and then either they had enough expertise or 
called Sun''s support organization to get a pool manually reverted to
its
previous uberblock. So from Sun''s perspective the issue was far from 
being top-priority and the resources are limited as usual. Still IIRC it 
was thanks to some vocal users here complaining about the issue which 
convinced ZFS developers to get it expedited... :)

ps. sorry for a chaotic email but lack of time is mine friend as usual :)

-- 
Robert Milkowski
http://milek.blogspot.com

Tim Haley

2009-Nov-04 23:56 UTC

head link

[zfs-discuss] ZFS + fsck

Robert Milkowski wrote:> Kevin Walker wrote:
>> Hi all,
>>
>> Just subscribed to the list after a debate on our helpdesk lead me to 
>> the posting about  ZFS corruption and the need for a fsck repair tool 
>> of some kind...
>>
>> Has there been any update on this?
>>
>>   
> 
> I guess the discussion started after someone read an article on OSNEWS.
> 
> The way zfs works is that basically you get an fsck equivalent while 
> using a pool.
> ZFS checks checksums for all metadata and user data while reading it. 
> Then all metadata are using ditto blocks to provide 2 or three copies of 
> it (totally independent from any pool redundancy) depends on type of 
> metadata. If it is corrupted a second (or third) copy will be used so 
> correct data is returned and a corrupted block is automatically 
> repaired. The ability to repair a block containing a user data depends 
> on if you have a pool configured with or without redundancy. But even if 
> pool is non-redundant (lets say a single disk drive) zfs still will be 
> able to detect corruption and will be able to tell you what files are 
> affected while metadata will be correct in most cases (unless corruption 
> is so large and not localized so it affected all copies of a block in a 
> pool). You will be able to read all other files and other parts of the 
> file.
> 
> So fsck actually happens while you are accessing your data and it is 
> even better than fsck on most other filesystems as thanks to 
> checksumming of all data and metadata zfs knows exactly when/if 
> something is wrong and in most cases is even able to fix it on the fly. 
> If you want to scan entire pool including all redundant copies and get 
> them fix if something doesn''t checksum then you can schedule the
pool
> scrubbing (while your applications are still using the pool!). This will 
> force zfs to read all blocks from all copies to be read, their checksum 
> checked and if needed data corrected if possible and the fact reported 
> to user. Legacy fsck is not even close to it.
> 
> 
> I think that the perceived need for fsck for ZFS probably comes from 
> lack of understanding how ZFS works and from some frustrated users where 
> under a very unlikely and rare circumstances due to data corruption a 
> user might be in a position of not being able to import the pool 
> therefore not being able to access any data at all while a corruption 
> might have affected only a relatively small amount of data. Most other 
> filesystem will allow you to access most of the data after fsck in such 
> a situation (probably with some data loss) while zfs left user with no 
> access to data at all. In such a case the problem lies with zfs 
> uberblock and the remedy is to revert a pool to its previous uberblock 
> version (or even an earlier one). In almost all the cases this will 
> render a pool importable and then the mechanisms described in the first 
> paragraph above will kick-in. The problem is (was) that the procedure to 
> revert a pool to one of its previous uberblock is not documented nor is 
> automatic and is definitely far from being sys-admin friendly. But 
> thanks to some community members (most notably mr. Victor I think) some 
> users affected by the issue were given a hand and were able to recover 
> most/all their data. Others were probably assisted by Sun''s
support
> service I guess.
> 
> Fortunately a much more user-friendly mechanism has been finally 
> implemented and inegrated into Open Solaris build 126 which allows a 
> user to import a pool and force it to on of the previous versions of its 
> uberblock if necessary. See 
> http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
> for more details.
> 
> There is another CR (don''t have its number at hand) which is about
> implementing a delayed re-use on just freed blocks which should allow 
> for more data to be recovered in such a case as above. Although
I''m not
> sure if it has been implemented yet.
> 
> IMHO with the above CR implemented, in most cases ZFS currently provides 
> *much* better solution to random data corruption than any other 
> filesystem+fsck in the market.
> The code for the putback of 2009/479 allows reverting to an earlier uberblock 
AND defers the re-use of blocks for a short time to make this "rewind"
safer.

-tim

Robert Milkowski

2009-Nov-05 00:07 UTC

head link

[zfs-discuss] ZFS + fsck

Tim Haley wrote:> Robert Milkowski wrote:
>>
>> There is another CR (don''t have its number at hand) which is
about
>> implementing a delayed re-use on just freed blocks which should allow 
>> for more data to be recovered in such a case as above. Although
I''m
>> not sure if it has been implemented yet.
>>
>> IMHO with the above CR implemented, in most cases ZFS currently 
>> provides *much* better solution to random data corruption than any 
>> other filesystem+fsck in the market.
>>
> The code for the putback of 2009/479 allows reverting to an earlier 
> uberblock AND defers the re-use of blocks for a short time to make 
> this "rewind" safer.
>
Excellent! Thank you for the information.

-- 
Robert Milkowski
http://milek.blogspot.com

Orvar Korvar

2009-Nov-05 09:16 UTC

head link

[zfs-discuss] ZFS + fsck

Does this putback mean that I have to upgrade my zpool, or is it a zfs tool? If
I missed upgrading my zpool I am smoked?
-- 
This message posted from opensolaris.org

Tim Haley

2009-Nov-05 18:41 UTC

head link

[zfs-discuss] ZFS + fsck

Orvar Korvar wrote:> Does this putback mean that I have to upgrade my zpool, or is it a zfs
tool? If I missed upgrading my zpool I am smoked?
The putback did not bump zpool or zfs versions.  You shouldn''t have to
upgrade
your pool.

-tim

Miles Nordin

2009-Nov-05 20:01 UTC

head link

[zfs-discuss] ZFS + fsck

>>>>> "csb" == Craig S Bell <cbell at
standard.com> writes:
   csb> Two: If you lost data with another filesystem, you may have
   csb> overlooked it and blamed the OS or the application,

yeah, but with ZFS you often lose the whole pool in certain classes of
repeatable real-world failures, like hotswap disks with flakey power
or SAN''s without NVRAM where the target reboots and the initiator does
not.  Losing the whole pool is relevantly different to corrupting the
insides of a few files.  Yes, I know, the red-eyed screaming ZFS rats
will come out of the walls screaming ``that 1 bit could have been
critical Banking Data on which millions of lives depend and nuclear
reactors and spaceships too!  Wouldn''t you rather KNOW, even if ZFS
desides to inform with
zpool_self-destruct_condescending-error()?''''
Maybe, sometimes, yes, but USUALLY, **NO**!

I''ve no objection to deciding how much recovery tools are needed based
on experience rather than wide-eyed kool-aid ranting or presumptions
from earlier filesystems, but so far experience says the recovery work
was really needed, so I can''t agree with the bloggers rehashing each
other''s zealotry.

It would be nice to isolate and fix the underlying problems, though.
That is the spirit in all these ``we don''t need no fsck because we are
perfect'''' blogs with which I do agree.  Their overoptimism
isn''t as
honest as I''d like about the way ZFS''s error messages do not
enough to
lead us toward the real cause in the case of SAN problems because they
are all designed presuming spatially-clustered, temporally-spread,
disk-based failures rather than temporally-clustered interconnect
failures, so rather the error detection becomes no more than
``printf("simon sez u will not blame me, blame someone else.  these
aren''t the droids you''re looking for.  move
along.");'''' ....but, yeah,
the blogger''s point of banging on the whole stack until it works
rather than concealing errors, is a good one.  Unfortunately I don''t
think that''s what will actually happen with these dropped-write SAN
failures.  I think people will just use the new recovery bits, which
conceal errors just like earlier filesystems and fsck tools, and
shrug.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091105/ac1cfc07/attachment.bin>

Miles Nordin

2009-Nov-05 20:04 UTC

head link

[zfs-discuss] ZFS + fsck

>>>>> "rm" == Robert Milkowski <milek at
task.gda.pl> writes:
    rm> Personally I don''t blame Sun that implementing the CR took
so
    rm> long as it mostly affected home users with cheap hardware from
    rm> BestBuy like sources 

no, many of the reports were FC SAN''s.

    rm> and even then it was relatively rare.

no, they said they were losing way more zpools than they ever lost
vxfs''s in the same environment.

    rm> called enterprise customers were affected even less and then
    rm> either they had enough expertise or called Sun''s support
    rm> organization to get a pool manually reverted to its previous
    rm> uberblock.

which is probably why the tool exists.  but, great!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091105/b8ffba8f/attachment.bin>

Robert Milkowski

2009-Nov-05 20:53 UTC

head link

[zfs-discuss] ZFS + fsck

Miles Nordin wrote:>>>>>> "csb" == Craig S Bell <cbell at
standard.com> writes:
>>>>>>    csb> Two: If you lost data with another
filesystem, you may have
>>>>>>    csb> overlooked it and blamed the OS or the
application,
>>>>>>
>>>>>> yeah, but with ZFS you often lose the whole pool in
certain classes of
>>>>>> repeatable real-world failures, like hotswap disks with
flakey power
>>>>>> or SAN''s without NVRAM where the target
reboots and the initiator does
>>>>>> not.  Losing the whole pool is relevantly different to
corrupting the
>>>>>> insides of a few files. I think that most people including ZFS developers agree with you that 
losing an access to entire pool is not acceptable. And this has been 
fixed in snv_126 so now in those rare circumstances you should be able 
to import a pool. And generally you will end-up in a much better 
situation than with legacy filesystems + fsck.


-- 
Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2009-Nov-05 20:59 UTC

head link

[zfs-discuss] ZFS + fsck

Miles Nordin wrote:>>>>>> "rm" == Robert Milkowski <milek at
task.gda.pl> writes:
>>>>>>             
>
>     rm> Personally I don''t blame Sun that implementing the CR
took so
>     rm> long as it mostly affected home users with cheap hardware from
>     rm> BestBuy like sources 
>
> no, many of the reports were FC SAN''s.
>
>     rm> and even then it was relatively rare.
>
> no, they said they were losing way more zpools than they ever lost
> vxfs''s in the same environment.
>
>   Well, who''s they? I''ve been depolying ZFS for years on many
different
platforms from low-end, jbods, thru midrange, SAN, and high-end disk 
arrays and I have yet to loose a pool (hopefully not).
It doesn''t mean that some other people did not have problems or did not
loose they pools - in most if not in all such cases almost all data 
could probably be recovered by following manual and "hackish"
procedure
to rollback to a previous uberblock. Now it is integrated into ZFS and 
no special knowledge is required to be able to do so in such circumstances.

Then there might have been other bugs... life, no software is without them.
>     rm> called enterprise customers were affected even less and then
>     rm> either they had enough expertise or called Sun''s
support
>     rm> organization to get a pool manually reverted to its previous
>     rm> uberblock.
>
> which is probably why the tool exists.  but, great!
>   The point is that you don''t need the tool now as it is built-in in zfs 
starting with snv_126.

Nigel Smith

2009-Nov-05 21:31 UTC

head link

[zfs-discuss] ZFS + fsck

Hi Robert
I think you mean snv_128 not 126 :-)

  6667683  need a way to rollback to an uberblock from a previous txg 
  http://bugs.opensolaris.org/view_bug.do?bug_id=6667683

  http://hg.genunix.org/onnv-gate.hg/rev/8aac17999e4d

Regards
Nigel Smith
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Nov-05 21:52 UTC

head link

[zfs-discuss] ZFS + fsck

On Thu, 5 Nov 2009, Miles Nordin wrote:
>>>>>> "rm" == Robert Milkowski <milek at
task.gda.pl> writes:
>
>    rm> Personally I don''t blame Sun that implementing the CR
took so
>    rm> long as it mostly affected home users with cheap hardware from
>    rm> BestBuy like sources
>
> no, many of the reports were FC SAN''s.
Do you have a secret back-channel to receive these many reports?  Are 
the reports from trolls or gnomes?
>    rm> and even then it was relatively rare.
>
> no, they said they were losing way more zpools than they ever lost
> vxfs''s in the same environment.
Who are ''they''?  Are they the little gnomes that come out at
night and
lurk in your computer room?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim Haley

2009-Nov-05 22:04 UTC

head link

[zfs-discuss] ZFS + fsck

Robert Milkowski wrote:> Miles Nordin wrote:
>>>>>>> "csb" == Craig S Bell <cbell at
standard.com> writes:
>>>>>>>    csb> Two: If you lost data with another
filesystem, you may have
>>>>>>>    csb> overlooked it and blamed the OS or the
application,
>>>>>>>
>>>>>>> yeah, but with ZFS you often lose the whole pool in
certain
>>>>>>> classes of
>>>>>>> repeatable real-world failures, like hotswap disks
with flakey power
>>>>>>> or SAN''s without NVRAM where the target
reboots and the initiator
>>>>>>> does
>>>>>>> not.  Losing the whole pool is relevantly different
to corrupting
>>>>>>> the
>>>>>>> insides of a few files. 
> I think that most people including ZFS developers agree with you that 
> losing an access to entire pool is not acceptable. And this has been 
> fixed in snv_126 so now in those rare circumstances you should be able 
> to import a pool. And generally you will end-up in a much better 
> situation than with legacy filesystems + fsck.
> 
> Just a slight correction.  The current build in-process is 128 and
that''s the
build into which the changes were pushed.

-tim

Gary Mills

2009-Nov-05 22:38 UTC

head link

[zfs-discuss] ZFS + fsck

On Thu, Nov 05, 2009 at 03:04:05PM -0700, Tim Haley
wrote:> Robert Milkowski wrote:
> >I think that most people including ZFS developers agree with you that 
> >losing an access to entire pool is not acceptable. And this has been 
> >fixed in snv_126 so now in those rare circumstances you should be able 
> >to import a pool. And generally you will end-up in a much better 
> >situation than with legacy filesystems + fsck.
> >
> Just a slight correction.  The current build in-process is 128 and
that''s
> the build into which the changes were pushed.
It would be nice to see this information at:

    http://hub.opensolaris.org/bin/view/Community+Group+on/126-130

but it hasn''t changed since 23 October.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-

Nigel Smith

2009-Nov-06 04:09 UTC

head link

[zfs-discuss] ZFS + fsck

Hi Gary
I will let ''website-discuss'' know about this problem.
They normally fix issues like that.
Those pages always seemed to just update automatically.
I guess it''s related to the website transition.
Thanks
Nigel Smith
-- 
This message posted from opensolaris.org

Dave Koelmeyer

2009-Nov-06 05:48 UTC

head link

[zfs-discuss] ZFS + fsck

Thanks for taking the time to write this - very useful info :)
-- 
This message posted from opensolaris.org

Brian Wilson

2009-Nov-06 15:16 UTC

head link

[zfs-discuss] ZFS + fsck

I like it. 

Any idea what rev of zfs has the  PSARC 2009/479 zpool recovery support 
<http://c0t0d0s0.org/codenews/fragments/c3570a6dcfb6712c7307e758e58550ee7b9c32b8.txt>
?

cheers,
Brian

Craig S. Bell wrote:> Joerg just posted a lengthy answer to the fsck question:
>
>
http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html
>
> Good stuff.  I see two answers to "nobody complained about lying
hardware before ZFS".
>
> One:  The user has never tried another filesystem that tests for end-to-end
data integrity, so ZFS notices more problems, and sooner.
>
> Two: If you lost data with another filesystem, you may have overlooked it
and blamed the OS or the application, instead of the inexpensive hardware.
>

Miles Nordin

2009-Nov-06 22:32 UTC

head link

[zfs-discuss] ZFS + fsck

>>>>> "rm" == Robert Milkowski <milek at
task.gda.pl> writes:
    rm> who''s they? 

posters to this list.  not interested in going in endless circles and
spending half an hour hunting for citations because Someone is Wrong
on the Internet.  posts are there, go find them, or agree to disagree.

    rm> I''ve been depolying ZFS for years on many different
platforms
    rm> from low-end, jbods, thru midrange, SAN, and high-end disk
    rm> arrays and I have yet to loose a pool

well, have you ever lost a vxfs?  This is another case of ``I can''t
tell you how close to zero the number of problems *I''ve* had with it
is.  It''s so close, it is zero, so this means by extrapolation that no
one is having problems anywhere, and I don''t need to bother reading
any `lists'' where people report problems.''''  Sorry,
but no.  I''m less
interested in ``I installed a zpool on something big and expensive and
it worked,'''' more interested in ``losing more zpools than
vxfs''s in
the same clustered environment.  we just restore from backup but are
annoyed by the lost time,'''' which is the post I remember.

    rm> in all such cases almost all data could probably be recovered
    rm> by following manual and "hackish" procedure to rollback to
a
    rm> previous uberblock. 

this is often not timely, cost-effective, acceptable, or even within
reach.

    rm> Now it is integrated into ZFS and no special knowledge is
    rm> required

this, of course, is.

It''s also good that AIUI it works without a pool version bump, so you
can boot an exotic new livedvd and recover a pool for an older stable
release that lacks the fix.

Unfortunately it will take some time to get the new builds and then
more time to gain experience and know if the ueberblock rollback fixes
bring ZFS resiliency on SAN''s in line with vxfs, especially with the
amount of fuzzing around the old issue, of people blaming the lost
pools on bitflip gremlins and telling people they need zpool-layer
redundancy and citing various papers about UNC''s and CRC errors on
520-byte-sector netapp disks that have nothing to do with the SAN
problems, and even now having selective memory of list posts as in
``yes I know the class of problems the ueberblock rollback fixes.  But
nobody ever had any of those problems, in spite of the fact the
problems exist as a CLASS and we can draw a fucking box aroudn
it.''''
It was always a hazy box, and I don''t know that anyone really
root-caused the SAN problems, we just ended up with a bunch of broken
pools that were all recovered using the same technique and imagined
backwards from there.  I''m optimistic about the fix though.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091106/5be81144/attachment.bin>

Richard Elling

2009-Nov-06 23:48 UTC

head link

[zfs-discuss] ZFS + fsck

On Nov 6, 2009, at 2:32 PM, Miles Nordin wrote:>    rm> I''ve been depolying ZFS for years on many different
platforms
>    rm> from low-end, jbods, thru midrange, SAN, and high-end disk
>    rm> arrays and I have yet to loose a pool
Few people have encountered the problem where rollback is
the solution.  Few people need heart bypass surgery, either.
> well, have you ever lost a vxfs?  This is another case of ``I
can''t
> tell you how close to zero the number of problems *I''ve* had with
it
> is.
[Richard raises his hand, showing the scar from a lost vxfs file
system, just below the wrist, next to the vxvm scar :-)]
> It''s so close, it is zero, so this means by extrapolation that no
> one is having problems anywhere, and I don''t need to bother
reading
> any `lists'' where people report problems.'''' 
Sorry, but no.  I''m less
> interested in ``I installed a zpool on something big and expensive and
> it worked,'''' more interested in ``losing more zpools than
vxfs''s in
> the same clustered environment.  we just restore from backup but are
> annoyed by the lost time,'''' which is the post I remember.
>
>    rm> in all such cases almost all data could probably be recovered
>    rm> by following manual and "hackish" procedure to rollback
to a
>    rm> previous uberblock.
>
> this is often not timely, cost-effective, acceptable, or even within
> reach.
>
>    rm> Now it is integrated into ZFS and no special knowledge is
>    rm> required
>
> this, of course, is.
>
> It''s also good that AIUI it works without a pool version bump, so
you
> can boot an exotic new livedvd and recover a pool for an older stable
> release that lacks the fix.
I understand it to work this way. I think the integrators and
service folks will manage the expectation setting properly for
those folks who have service contracts.  For the rest, you might
just hear a "boot latest LiveCD and fix," which also seems
reasonable.  In a few years, it will be a distant memory.
> Unfortunately it will take some time to get the new builds and then
> more time to gain experience and know if the ueberblock rollback fixes
> bring ZFS resiliency on SAN''s in line with vxfs, especially with
the
> amount of fuzzing around the old issue, of people blaming the lost
> pools on bitflip gremlins and telling people they need zpool-layer
> redundancy and citing various papers about UNC''s and CRC errors on
> 520-byte-sector netapp disks that have nothing to do with the SAN
> problems, and even now having selective memory of list posts as in
> ``yes I know the class of problems the ueberblock rollback fixes.
Actually, I like NetApp''s response better, in some ways. They are now
using a parity block (512 bytes) for every 8 blocks. This can work well
for PC-like clients (8 512-byte blocks = 4 KB). This has the beneficial
affect of using a  code  which can contain enough info to support
correction, rather than just a digest. Digests, by design, are
intended for verification and completely useless for correction.

Now that ZFS can report the bitwise extent of errors (b125), we can
finally  get a real sense of the sorts of corruption we''re dealing
with.
One potential problem with using a whole block for checksum/ECC
is that there are failure modes which affect multiple blocks: either
spatially or temporally. But once you know the failure modes, you
create better solutions...
  -- richard

Nigel Smith

2009-Nov-07 00:35 UTC

head link

[zfs-discuss] ZFS + fsck

Richard Elling wrote:> Now that ZFS can report the bitwise extent of errors (b125), we can
Richard, I had not noticed that feature being added.
Do you have the bug number for that feature to hand?
Thanks
Nigel
-- 
This message posted from opensolaris.org

A Darren Dunham

2009-Nov-07 01:02 UTC

head link

[zfs-discuss] ZFS + fsck

On Fri, Nov 06, 2009 at 03:48:24PM -0800, Richard Elling
wrote:> Actually, I like NetApp''s response better, in some ways. They are
now
> using a parity block (512 bytes) for every 8 blocks. This can work well
> for PC-like clients (8 512-byte blocks = 4 KB). This has the beneficial
> affect of using a  code  which can contain enough info to support
> correction, rather than just a digest. Digests, by design, are
> intended for verification and completely useless for correction.
I''m not sure I follow.  I thought the 8/9ths thing was just a bundled
netapp-style checksum with no parity involved (only used on ATA drives).
And that it would go to RAID-4 or RAID-DP for any correction.

-- 
Darren

Richard Elling

2009-Nov-07 01:46 UTC

head link

[zfs-discuss] ZFS + fsck

On Nov 6, 2009, at 5:02 PM, A Darren Dunham wrote:
> On Fri, Nov 06, 2009 at 03:48:24PM -0800, Richard Elling wrote:
>> Actually, I like NetApp''s response better, in some ways. They
are now
>> using a parity block (512 bytes) for every 8 blocks. This can work  
>> well
>> for PC-like clients (8 512-byte blocks = 4 KB). This has the  
>> beneficial
>> affect of using a  code  which can contain enough info to support
>> correction, rather than just a digest. Digests, by design, are
>> intended for verification and completely useless for correction.
>
> I''m not sure I follow.  I thought the 8/9ths thing was just a
bundled
> netapp-style checksum with no parity involved (only used on ATA  
> drives).
> And that it would go to RAID-4 or RAID-DP for any correction.
Perhaps, if they would open source it, we could examine in detail :-)
The key is that 512 bytes is enough size that you could implement
some, albeit limited, error correction for 4 KB (think RAID-5, 8+1).
OTOH, 256 bits is nowhere near enough space to implement an
interesting correction for 128 KB (1M bits).
  -- richard

Richard Elling

2009-Nov-07 01:53 UTC

head link

[zfs-discuss] ZFS + fsck

On Nov 6, 2009, at 4:35 PM, Nigel Smith wrote:
> Richard Elling wrote:
>> Now that ZFS can report the bitwise extent of errors (b125), we can
>
> Richard, I had not noticed that feature being added.
> Do you have the bug number for that feature to hand?
PSARC 2009/497 zfs checksum ereport payload additions
http://arc.opensolaris.org/caselog/PSARC/2009/497/

CR 6867188 zfs checksum ereports could be more informative
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6867188

Good stuff!
  -- richard

Robert Milkowski

2009-Nov-08 13:55 UTC

head link

[zfs-discuss] ZFS + fsck

fyi

Robert Milkowski wrote:> XXX wrote:
>> | Have you actually tried to roll-back to previous uberblocks when you
>> | hit the issue?  I''m asking as I haven''t yet heard
about any case
>> | of the issue witch was not solved by rolling back to a previous
>> | uberblock. The problem though was that the way to do it was
"hackish".
>>
>>  Until recently I didn''t even know that this was possible or a
likely
>> solution to ''pool panics system on import'' and
similar pool destruction,
>> and I don''t have any tools to do it. (Since we run Solaris 10,
we won''t
>> have official support for it for quite some time.)
>>   
> I wouldn''t be that surprised if this particular feature would
actually
> be backported to S10 soon. At least you may raise a CR asking for it - 
> maybe you will get an access to IDR first (I''m not saying there is
or
> isn''t already one).
>
>>  If there are (public) tools for doing this, I will give them a try
>> the next time I get a test pool into this situation.
>>   
>
> IIRC someone send one to the zfs-discuss list some time ago.
> Then usually you will also need to poke with zdb.
> A sketchy and unsupported procedure was discussed on the list as well.
> Look at the archives.
>
>> | The bugs which prevented importing a pool in some circumstances were
>> | really "annoying" but lets face it - it was bound to happen
and they
>> | are just bugs which are getting fixed. ZFS is still young after all.
>> | And when you google for data loss on other filesystems I''m
sure you
>> | will find lots of user testimonies - be it ufs, ext3, raiserfs or
your
>> | favourite one.
>>
>>  The difference between ZFS and those other filesystems is that with
>> a few exceptions (XFS, ReiserFS), which sysadmins in the field
didn''t
>> like either, those filesystems didn''t generally lose *all*
your data
>> when something went wrong. Their official repair tools could usually
>> put things back together to at least some extent.
>>   
> Generally they didn''t although I''ve seen situation when
entire ext2
> and ufs were lost and fsck was not able to get them even mounted 
> (kernel panics right after mounting them). In other occassion fsck was 
> crashing the box in yet another one fsck claimed everything was ok but 
> then when doing backup system was crashing (fsck can''t really
properly
> fix filesystem state - it is more of guessing and sometimes it goes 
> terribly wrong).
>
> But I agrre that generally with other file systems you can recover 
> most or all data just fine.
> And generally it is the case with zfs - there were probably more bugs 
> in ZFS as it is much younger filesystem, but most of them were very 
> quickly fixed. And the uberblock one - I 100% agree then when you hit 
> the issue and didn''t know about manual method to recover it was
very
> bad - but it has finally been fixed.
>
>> (Just as importantly, when they couldn''t put things back
together you
>> could honestly tell management and the users ''we ran the
recovery tools
>> and this is all they could get back''. At the moment, we would
have
>> to tell users and management ''well, there are no (official)
recovery
>> tools...'', unless Sun Support came through for once.)
>>   
>
> But these tools are built-in into zfs and are happening automatically 
> and with virtually 100% confidence that if something can be fixed it 
> is fixed correctly and if something is wrong it will be detected - 
> thanks to end-to-end checksumming of data and meta-data. The problem 
> *was* that one case scenario when rolling back to previous uberblock 
> is required was not implemented and required a complicated and 
> undocumented procedure to follow. It wasn''t high priority for Sun
as
> it was very rare , wasn''t affecting much enterprise customers and 
> although complicated the procedure is there is one and was 
> successfully used on many occasions even for non paying customers 
> thanks to guys like Victor on the zfs mailing list who helped some 
> people in such a situations.
>
> But you didn''t know about it and it seems like Sun''s
support service
> was no use for you - which is really a shame.
> In your case I would probably point that out to them and at least get 
> some good deal as a compensation or something...
>
> But what is most important is that finally fully supported, built in 
> and easy to use procedure is available to recover from such 
> situations. As time will progress and more bugs will be fixed ZFS will 
> behave much better under many corner cases as it does already in Open 
> Solaris - last 6 months or so were really very productive in fixing 
> many bugs like that.
>
>> | However the whole point of the discussion is that zfs really 
>> doesn''t | need a fsck tool.
>> | All the problems encountered so far were bugs and most of them are 
>> | already fixed. One missing feature was a built-in support for | 
>> rolling-back uberblock which just has been integrated. But I''m
sure |
>> there are more bugs to be found..
>>
>>  I disagree strongly. Fsck tools have multiple purposes; ZFS obsoletes
>> some of them but not all. One thing fsck is there for is to recover as
>> much as possible after things happen that are supposed to be
impossible,
>> like operating system bugs or crazy corruption. ZFS''s current
attitude
>> is more or less that impossible things won''t happen so it
doesn''t have
>> to do anything (except, perhaps, panic with assert failures).
>>   
> This is not true - I will try to explain why.
> Generally if you want to recover some data from a filesystem you need 
> to get it into a state you can mount it (at least RO). Most legacy 
> filesystems when  hitting with the problem that metadata do not make 
> sense to them and they think it is wrong  won''t allow you to mount
the
> filesystem and will ask you to run fsck. Now as there are not checksum 
> in these filesystems generally there is no accurate way of telling how 
> the bad metadata should be fixed. Fsck is looking for obvious things 
> and is trying to "guess" in many cases and sometimes it is right
and
> sometimes it is not. Then sometimes it won''t even detect then
there
> was corruption. Also keep in mind that fsck in most filesystems does 
> not even try to check for user data - just metadata. The main reason 
> is that it can''t really do it.
> Now because running fsck could potentially be disastrous  to a 
> filesystem and lead to even more damage if it is started automatically 
> (for example during system boot) it is started in an interactive-mode 
> and if some less obvious fixes are required it will require a human to 
> confirm its action. But even then it is still just guessing what it is 
> supposed to do. And it happens that situation gets even worse.
>
> Then sometimes there were bugs both in filesystems and fsck and user 
> was left with no access to data at all until these bugs were fixed (or 
> user was skilled enough to fix/workaround them on his/her own). I came 
> across such problems on EMC IP4700, EMC Celerra and couple of other 
> systems in my life. For example fsck was running for well over 10h 
> consuming more and more memory and finally server was running out of 
> memory and fsck died... and it all started over again, failed 
> again.... in other case fsck was just crashing during repair in the 
> same location and file system was crashing the os after couple of 
> minutes after mounting it..
>
> The other problem with fsck is that even if it thinks that filesystem 
> is ok it actually might not be - even its metadata state. Then all 
> different things might happen - like when accessing a given file or 
> directory a system will panic or more data will get corrupted... I was 
> in such a situation couple of times and it took days to copy files 
> from such a filesystem to another one with many panics in-between when 
> we had to skip such files or directories, etc. fsck didn''t help
and
> reported everything is fine.
>
> Now with ZFS it is completely different world. ZFS is able in 
> virtually all cases to detect if its meta-data and data on-disk is 
> corrupted in anyway or not thanks to its end-to-end checksumming. If 
> someone is concern with how strong default checksumming is (fletcher4) 
> then currently one cas switch zfs to use sha256 to have a good sleep. 
> So here is first big difference compared to most filesystems in a 
> market - ZFS if some data is corrupted does not have to *guess* if it 
> is the case or not but can actually detect it with almost 100% 
> confidence when it is the case.
> Once such a case is detected ZFS will try to automatically fix the 
> issue if there is redundant copy of corrupted block available - if 
> there is it will all happen transparently to applications without any 
> need to unmount filesystems or run external tools like fsck. Then 
> because ZFS checksums both metadata and user data it will be able to 
> detect and possibly fix data corruptions in both cases (which fsck 
> can''t even if it is lucky). Now even if you are not doing any 
> redundancy at pool level by using ZFS its metadata blocks are always 
> kept in at least two copies physically separated on disk if possible. 
> What it means is that even in a single disk configuration (or stirpe) 
> if some data is corrupted zfs will be able to detect it and if it is 
> meta-data block it will be able not only to detect it but also 
> automatically and transparently fix it and preserve filesystem 
> consistency. There is a simple test you may run - create a pool on top 
> of one disk drive, put some files in it then overwrite lets say 20% of 
> the disk drives with some random data or zeros while zfs is running. 
> Then flush caches (export/import pool) and try to access allmetadata 
> by doing a full ls -lra on a filesystem. You should be able to get a 
> full listing with proper attributes, etc. but if you check zpool 
> status it will probably report many checksum errors which were 
> corrected. (when overwriting overwrite so portion of the beginning of 
> the disk as zfs will usually start writing to a disk from the 
> beginning). Now if you actually try to read a file contents it should 
> be fine if you lucky enough to read onwhich was not overwritten and if 
> you are unlucky you won''t be able to read blocks which are
corrupted
> (and since you don''t have ane redundancy at zfs level it
can''t fix its
> user-data but can detect it) but you will be able to read all the 
> other blocks from the file. Now try to do something like these with 
> any other file system - you will probably end-up with os panic and in 
> many cases fsck won''t be able to recover file system to such a
point
> so you can recover some data.... and when fixing it will be only 
> guessing what to do and skip user data entirely...
>
> Now there is a specific scenario case of the above when metadata is 
> corrupted which is describing pool itself or its root block and it 
> can''t be fixed as all copies are wrong. ZFS can also detect it but
an
> extra functionality was not implemented until very recently to 
> actually try to use N-1 rootblock in such the case. This was very 
> unfortunate but because it was very rare in the field and resources 
> are limited as usual it wasn''t implemented - instead there was an 
> undocumented, unsupported and hard to follow procedure on how to do it 
> manually - and some people did use it successfully (check zfs-discuss 
> archives). But of course it shouldn''t be like that and ZFS
developers
> did recognized it by having accepted bug report on it. Bur limited 
> resources...... fortunately a built-in mechanism to deal with such a 
> case has finally been implemented. So now when it happens a user will 
> have a choice of importing a pool with extra option to rollback to a 
> previous version of txg so the pool can be imported. From now one all 
> the mechanisms described above will kick-in. And again - no guessing 
> here but a guarantee of detecting a corruption and fixing it if 
> possible. And you don''t even have to run any check and wait hours 
> sometimes days on large filesystems with millions of files before you 
> can access your data (and still not be sure what exactly you''re 
> accessing and if it won''t cause further issues). Of course it
would
> probably be wise to run zpool scrub to force reading all data and 
> metadata and checking their checksum and fix them if possible at 
> convinient time for you but in a mean time you may run your 
> applications and any corruptions will be detected and fixed while data 
> is being accessed.
>
> So from the practical point of view you make think of the mechanisms 
> in ZFS as a built-in fsck with an ability to actually detect when 
> corruption happens (instead of just guessing it and just for 
> meta-data), get it fixed if a redundant copy is available (and do it 
> transparently to applications). Having a separate tool doesn''t
really
> makes sense here. Of course you can always write a script called 
> fsck.zfs which will import a pool and run zpool scrub if you want. And 
> sometimes people will do exactly that before going back into 
> production. But having a genuine extra tool like fsck doesn''t
really
> make sense - what such a tool should exactly do (keeping in mind all 
> the above)?
>
> Then there were a couple of bugs which prevented ZFS from importing a 
> pool with some specific corruptions which were entirely fixable (AFAIK 
> all known were fixed in Open Solaris). When you think about it - we 
> are talking about bugs here - if you would put all the recovery 
> mechanisms into a separate tool called fsck with the same bugs it 
> wouldn''t be able to repair such a pool anyway, would it? So you
would
> need to fix these bugs first - but once you fixed them the zfs will 
> able to mount such a pool and still an external tool to do so is not 
> needed (or after applying a patch/fix do ''alias
fsck=''zpool import''''
> and then fsck pool will get your pool fixed... :)
> You might ask but what are you supposed to do until such a bug is 
> fixed? Well, what would you do if you wouldn''t be able to mount
ext2
> filesystem (or any other) and there was a bug in its fsck which would 
> prevent it from getting the fs into a mountable state.... you would 
> have to wait for a fix, or get it fixed yourself, or play with its 
> on-disk format with tools like e2fs, fsdb, ... and try to fix 
> filesystem manually. Well, on zfs you''ve also have zdb...
> or you would probably be forced to recover data from backup.
>
> The point here is that most filesystem and their tool had such bugs 
> and zfs is one of the youngest filesystems in the market so it is no 
> wonder in a way that such bugs are getting fixed now and not 5-7 years 
> ago. Then there is a critical mass of users required for a given 
> filesystems so it is deployed in many different environments, 
> different workloads, hardware, drievers, usage cases, ... so all these 
> corner cases can surface, users hopefully will report them and they 
> will get fixed. ZFS is becoming widely deployed only for last couple 
> of years or so so no wonder that most of these bugs were spotted (and 
> fixed) during the same period.
>
> But then thanks to a fundamentally different architecture of ZFS once 
> most (all? :)) of bugs like these are fixed ZFS offers something MUCH 
> better than legacy filesystems + fsck. It offers a guarantee of 
> detecting data corruption and fixing it properly when possible while 
> reporting what can''t be fixed and still providing an access to all
the
> other data in your pool.
>
>
> btw: the email exchange is private so I don''t won''t to
include
> zfs-discuss without your consent but if you want to forward this email 
> to zfs-discuss for other users benefit feel free to do so.
>
>
>> ) As the evolution of ZFS has demonstrated, impossible things *do* 
>> happen
>> and you *do* need the ability to recover as much as possible.  ZFS is
>> busy slapping bandaids over specific problems instead of dealing with
>> the general issue.
>>   
> Just a quick "google" and:
>
> 1. fsck fails and causes panic of Linux kernel
> https://bugzilla.redhat.com/show_bug.cgi?id=126238
>
> 2. btrfs - filesystem gots corrupted, running btrfsck causes even more 
> damage and entire filesystem is nuked due to a bug. BTRFS is not the 
> best example as it is far from being production ready but still...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=497821
>
> 3. linux gfs2 - fsck has a bug (or lack of feature) and is not able to 
> fix the filesystem with a specific corruption, but filesystem is 
> unmountable. The only option is to manually fix data on-disk with help 
> from a support service on case-by-case basis...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=457557
>
> 4. e2fsck segfaults + dumps core when trying to check a filesystem
>
> https://bugzilla.redhat.com/show_bug.cgi?id=108075
>
> 5. ext3 filesystem crashes - fsck can''t repair it and goes into 
> infinite loop.... fixed in development version of fsck
>
> https://bugzilla.redhat.com/show_bug.cgi?id=467677
>
> 6. gfs2 corruption is causing a linux kernel to panic.... fsck says it 
> fixes the issue but it doesn''t and system crashes all over again
under
> load...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=519049
>
> 7. ext3 filesystem can''t be mounted anf fsck won''t finish
after 10
> days of running (probably some kind of infinite looping bug again)
>
> http://ubuntuforums.org/archive/index.php/t-394744.html
>
> 8. AIX JFS2 filesystem corruption - due to a bug in fsck it can''t
fix
> the fs, data had to be recovered from backup
>
>
http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/error-518-file-system-corruption-366503
>
>
>
> 9.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=514511
> https://bugzilla.redhat.com/show_bug.cgi?id=477856
>
>
> And there are many more...
>
> The point again is that bugs happen even in fsck and until they are 
> fixed a common user/sysadmin quote often won''t be able to recover
on
> its own. ZFS is not exception here when it comes to bugs. But thanks 
> to its different approach (mostly end-to-end checksumming + COW) its 
> ability to detect data corruption and deal with it exceeds  most  
> generally available solutions in the market. The fixes for some bugs 
> mentioned before make it only more robust and reliable even for those 
> unlucky users before... :)
>
>
>

Jason King

2009-Nov-08 21:18 UTC

head link

[zfs-discuss] ZFS + fsck

On Sun, Nov 8, 2009 at 7:55 AM, Robert Milkowski <milek at task.gda.pl>
wrote:>
> fyi
>
> Robert Milkowski wrote:
>>
>> XXX wrote:
>>>
>>> | Have you actually tried to roll-back to previous uberblocks when
you
>>> | hit the issue? ?I''m asking as I haven''t yet
heard about any case
>>> | of the issue witch was not solved by rolling back to a previous
>>> | uberblock. The problem though was that the way to do it was
"hackish".
>>>
>>> ?Until recently I didn''t even know that this was possible
or a likely
>>> solution to ''pool panics system on import'' and
similar pool destruction,
>>> and I don''t have any tools to do it. (Since we run Solaris
10, we won''t
>>> have official support for it for quite some time.)
>>>
>>
>> I wouldn''t be that surprised if this particular feature would
actually be
>> backported to S10 soon. At least you may raise a CR asking for it -
maybe
>> you will get an access to IDR first (I''m not saying there is
or isn''t
>> already one).
>>
>>> ?If there are (public) tools for doing this, I will give them a try
>>> the next time I get a test pool into this situation.
>>>
>>
>> IIRC someone send one to the zfs-discuss list some time ago.
>> Then usually you will also need to poke with zdb.
>> A sketchy and unsupported procedure was discussed on the list as well.
>> Look at the archives.
>>
>>> | The bugs which prevented importing a pool in some circumstances
were
>>> | really "annoying" but lets face it - it was bound to
happen and they
>>> | are just bugs which are getting fixed. ZFS is still young after
all.
>>> | And when you google for data loss on other filesystems
I''m sure you
>>> | will find lots of user testimonies - be it ufs, ext3, raiserfs or
your
>>> | favourite one.
>>>
>>> ?The difference between ZFS and those other filesystems is that
with
>>> a few exceptions (XFS, ReiserFS), which sysadmins in the field
didn''t
>>> like either, those filesystems didn''t generally lose *all*
your data
>>> when something went wrong. Their official repair tools could
usually
>>> put things back together to at least some extent.
>>>
>>
>> Generally they didn''t although I''ve seen situation
when entire ext2 and
>> ufs were lost and fsck was not able to get them even mounted (kernel
panics
>> right after mounting them). In other occassion fsck was crashing the
box in
>> yet another one fsck claimed everything was ok but then when doing
backup
>> system was crashing (fsck can''t really properly fix filesystem
state - it is
>> more of guessing and sometimes it goes terribly wrong).
>>
>> But I agrre that generally with other file systems you can recover most
or
>> all data just fine.
>> And generally it is the case with zfs - there were probably more bugs
in
>> ZFS as it is much younger filesystem, but most of them were very
quickly
>> fixed. And the uberblock one - I 100% agree then when you hit the issue
and
>> didn''t know about manual method to recover it was very bad -
but it has
>> finally been fixed.
>>
>>> (Just as importantly, when they couldn''t put things back
together you
>>> could honestly tell management and the users ''we ran the
recovery tools
>>> and this is all they could get back''. At the moment, we
would have
>>> to tell users and management ''well, there are no
(official) recovery
>>> tools...'', unless Sun Support came through for once.)
>>>
>>
>> But these tools are built-in into zfs and are happening automatically
and
>> with virtually 100% confidence that if something can be fixed it is
fixed
>> correctly and if something is wrong it will be detected - thanks to
>> end-to-end checksumming of data and meta-data. The problem *was* that
one
>> case scenario when rolling back to previous uberblock is required was
not
>> implemented and required a complicated and undocumented procedure to
follow.
>> It wasn''t high priority for Sun as it was very rare ,
wasn''t affecting much
>> enterprise customers and although complicated the procedure is there is
one
>> and was successfully used on many occasions even for non paying
customers
>> thanks to guys like Victor on the zfs mailing list who helped some
people in
>> such a situations.
>>
>> But you didn''t know about it and it seems like Sun''s
support service was
>> no use for you - which is really a shame.
>> In your case I would probably point that out to them and at least get
some
>> good deal as a compensation or something...
>>
>> But what is most important is that finally fully supported, built in
and
>> easy to use procedure is available to recover from such situations. As
time
>> will progress and more bugs will be fixed ZFS will behave much better
under
>> many corner cases as it does already in Open Solaris - last 6 months or
so
>> were really very productive in fixing many bugs like that.
>>
>>> | However the whole point of the discussion is that zfs really
doesn''t |
>>> need a fsck tool.
>>> | All the problems encountered so far were bugs and most of them
are |
>>> already fixed. One missing feature was a built-in support for |
rolling-back
>>> uberblock which just has been integrated. But I''m sure |
there are more bugs
>>> to be found..
>>>
>>> ?I disagree strongly. Fsck tools have multiple purposes; ZFS
obsoletes
>>> some of them but not all. One thing fsck is there for is to recover
as
>>> much as possible after things happen that are supposed to be
impossible,
>>> like operating system bugs or crazy corruption. ZFS''s
current attitude
>>> is more or less that impossible things won''t happen so it
doesn''t have
>>> to do anything (except, perhaps, panic with assert failures).
>>>
>>
>> This is not true - I will try to explain why.
>> Generally if you want to recover some data from a filesystem you need
to
>> get it into a state you can mount it (at least RO). Most legacy
filesystems
>> when ?hitting with the problem that metadata do not make sense to them
and
>> they think it is wrong ?won''t allow you to mount the
filesystem and will ask
>> you to run fsck. Now as there are not checksum in these filesystems
>> generally there is no accurate way of telling how the bad metadata
should be
>> fixed. Fsck is looking for obvious things and is trying to
"guess" in many
>> cases and sometimes it is right and sometimes it is not. Then sometimes
it
>> won''t even detect then there was corruption. Also keep in mind
that fsck in
>> most filesystems does not even try to check for user data - just
metadata.
>> The main reason is that it can''t really do it.
>> Now because running fsck could potentially be disastrous ?to a
filesystem
>> and lead to even more damage if it is started automatically (for
example
>> during system boot) it is started in an interactive-mode and if some
less
>> obvious fixes are required it will require a human to confirm its
action.
>> But even then it is still just guessing what it is supposed to do. And
it
>> happens that situation gets even worse.
>>
>> Then sometimes there were bugs both in filesystems and fsck and user
was
>> left with no access to data at all until these bugs were fixed (or user
was
>> skilled enough to fix/workaround them on his/her own). I came across
such
>> problems on EMC IP4700, EMC Celerra and couple of other systems in my
life.
>> For example fsck was running for well over 10h consuming more and more
>> memory and finally server was running out of memory and fsck died...
and it
>> all started over again, failed again.... in other case fsck was just
>> crashing during repair in the same location and file system was
crashing the
>> os after couple of minutes after mounting it..
>>
>> The other problem with fsck is that even if it thinks that filesystem
is
>> ok it actually might not be - even its metadata state. Then all
different
>> things might happen - like when accessing a given file or directory a
system
>> will panic or more data will get corrupted... I was in such a situation
>> couple of times and it took days to copy files from such a filesystem
to
>> another one with many panics in-between when we had to skip such files
or
>> directories, etc. fsck didn''t help and reported everything is
fine.
>>
>> Now with ZFS it is completely different world. ZFS is able in virtually
>> all cases to detect if its meta-data and data on-disk is corrupted in
anyway
>> or not thanks to its end-to-end checksumming. If someone is concern
with how
>> strong default checksumming is (fletcher4) then currently one cas
switch zfs
>> to use sha256 to have a good sleep. So here is first big difference
compared
>> to most filesystems in a market - ZFS if some data is corrupted does
not
>> have to *guess* if it is the case or not but can actually detect it
with
>> almost 100% confidence when it is the case.
>> Once such a case is detected ZFS will try to automatically fix the
issue
>> if there is redundant copy of corrupted block available - if there is
it
>> will all happen transparently to applications without any need to
unmount
>> filesystems or run external tools like fsck. Then because ZFS checksums
both
>> metadata and user data it will be able to detect and possibly fix data
>> corruptions in both cases (which fsck can''t even if it is
lucky). Now even
>> if you are not doing any redundancy at pool level by using ZFS its
metadata
>> blocks are always kept in at least two copies physically separated on
disk
>> if possible. What it means is that even in a single disk configuration
(or
>> stirpe) if some data is corrupted zfs will be able to detect it and if
it is
>> meta-data block it will be able not only to detect it but also
automatically
>> and transparently fix it and preserve filesystem consistency. There is
a
>> simple test you may run - create a pool on top of one disk drive, put
some
>> files in it then overwrite lets say 20% of the disk drives with some
random
>> data or zeros while zfs is running. Then flush caches (export/import
pool)
>> and try to access allmetadata by doing a full ls -lra on a filesystem.
You
>> should be able to get a full listing with proper attributes, etc. but
if you
>> check zpool status it will probably report many checksum errors which
were
>> corrected. (when overwriting overwrite so portion of the beginning of
the
>> disk as zfs will usually start writing to a disk from the beginning).
Now if
>> you actually try to read a file contents it should be fine if you lucky
>> enough to read onwhich was not overwritten and if you are unlucky you
won''t
>> be able to read blocks which are corrupted (and since you
don''t have ane
>> redundancy at zfs level it can''t fix its user-data but can
detect it) but
>> you will be able to read all the other blocks from the file. Now try to
do
>> something like these with any other file system - you will probably
end-up
>> with os panic and in many cases fsck won''t be able to recover
file system to
>> such a point so you can recover some data.... and when fixing it will
be
>> only guessing what to do and skip user data entirely...
>>
>> Now there is a specific scenario case of the above when metadata is
>> corrupted which is describing pool itself or its root block and it
can''t be
>> fixed as all copies are wrong. ZFS can also detect it but an extra
>> functionality was not implemented until very recently to actually try
to use
>> N-1 rootblock in such the case. This was very unfortunate but because
it was
>> very rare in the field and resources are limited as usual it
wasn''t
>> implemented - instead there was an undocumented, unsupported and hard
to
>> follow procedure on how to do it manually - and some people did use it
>> successfully (check zfs-discuss archives). But of course it
shouldn''t be
>> like that and ZFS developers did recognized it by having accepted bug
report
>> on it. Bur limited resources...... fortunately a built-in mechanism to
deal
>> with such a case has finally been implemented. So now when it happens a
user
>> will have a choice of importing a pool with extra option to rollback to
a
>> previous version of txg so the pool can be imported. From now one all
the
>> mechanisms described above will kick-in. And again - no guessing here
but a
>> guarantee of detecting a corruption and fixing it if possible. And you
don''t
>> even have to run any check and wait hours sometimes days on large
>> filesystems with millions of files before you can access your data (and
>> still not be sure what exactly you''re accessing and if it
won''t cause
>> further issues). Of course it would probably be wise to run zpool scrub
to
>> force reading all data and metadata and checking their checksum and fix
them
>> if possible at convinient time for you but in a mean time you may run
your
>> applications and any corruptions will be detected and fixed while data
is
>> being accessed.
>>
>> So from the practical point of view you make think of the mechanisms in
>> ZFS as a built-in fsck with an ability to actually detect when
corruption
>> happens (instead of just guessing it and just for meta-data), get it
fixed
>> if a redundant copy is available (and do it transparently to
applications).
>> Having a separate tool doesn''t really makes sense here. Of
course you can
>> always write a script called fsck.zfs which will import a pool and run
zpool
>> scrub if you want. And sometimes people will do exactly that before
going
>> back into production. But having a genuine extra tool like fsck
doesn''t
>> really make sense - what such a tool should exactly do (keeping in mind
all
>> the above)?
>>
>> Then there were a couple of bugs which prevented ZFS from importing a
pool
>> with some specific corruptions which were entirely fixable (AFAIK all
known
>> were fixed in Open Solaris). When you think about it - we are talking
about
>> bugs here - if you would put all the recovery mechanisms into a
separate
>> tool called fsck with the same bugs it wouldn''t be able to
repair such a
>> pool anyway, would it? So you would need to fix these bugs first - but
once
>> you fixed them the zfs will able to mount such a pool and still an
external
>> tool to do so is not needed (or after applying a patch/fix do
''alias
>> fsck=''zpool import'''' and then fsck pool will
get your pool fixed... :)
>> You might ask but what are you supposed to do until such a bug is
fixed?
>> Well, what would you do if you wouldn''t be able to mount ext2
filesystem (or
>> any other) and there was a bug in its fsck which would prevent it from
>> getting the fs into a mountable state.... you would have to wait for a
fix,
>> or get it fixed yourself, or play with its on-disk format with tools
like
>> e2fs, fsdb, ... and try to fix filesystem manually. Well, on zfs
you''ve also
>> have zdb...
>> or you would probably be forced to recover data from backup.
>>
>> The point here is that most filesystem and their tool had such bugs and
>> zfs is one of the youngest filesystems in the market so it is no wonder
in a
>> way that such bugs are getting fixed now and not 5-7 years ago. Then
there
>> is a critical mass of users required for a given filesystems so it is
>> deployed in many different environments, different workloads, hardware,
>> drievers, usage cases, ... so all these corner cases can surface, users
>> hopefully will report them and they will get fixed. ZFS is becoming
widely
>> deployed only for last couple of years or so so no wonder that most of
these
>> bugs were spotted (and fixed) during the same period.
>>
>> But then thanks to a fundamentally different architecture of ZFS once
most
>> (all? :)) of bugs like these are fixed ZFS offers something MUCH better
than
>> legacy filesystems + fsck. It offers a guarantee of detecting data
>> corruption and fixing it properly when possible while reporting what
can''t
>> be fixed and still providing an access to all the other data in your
pool.
>>
>>
>> btw: the email exchange is private so I don''t won''t
to include zfs-discuss
>> without your consent but if you want to forward this email to
zfs-discuss
>> for other users benefit feel free to do so.
>>
>>
>>> ) As the evolution of ZFS has demonstrated, impossible things *do*
happen
>>> and you *do* need the ability to recover as much as possible. ?ZFS
is
>>> busy slapping bandaids over specific problems instead of dealing
with
>>> the general issue.
>>>
>>
>> Just a quick "google" and:
>>
>> 1. fsck fails and causes panic of Linux kernel
>> https://bugzilla.redhat.com/show_bug.cgi?id=126238
>>
>> 2. btrfs - filesystem gots corrupted, running btrfsck causes even more
>> damage and entire filesystem is nuked due to a bug. BTRFS is not the
best
>> example as it is far from being production ready but still...
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=497821
>>
>> 3. linux gfs2 - fsck has a bug (or lack of feature) and is not able to
fix
>> the filesystem with a specific corruption, but filesystem is
unmountable.
>> The only option is to manually fix data on-disk with help from a
support
>> service on case-by-case basis...
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=457557
>>
>> 4. e2fsck segfaults + dumps core when trying to check a filesystem
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=108075
>>
>> 5. ext3 filesystem crashes - fsck can''t repair it and goes
into infinite
>> loop.... fixed in development version of fsck
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=467677
>>
>> 6. gfs2 corruption is causing a linux kernel to panic.... fsck says it
>> fixes the issue but it doesn''t and system crashes all over
again under
>> load...
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=519049
>>
>> 7. ext3 filesystem can''t be mounted anf fsck won''t
finish after 10 days of
>> running (probably some kind of infinite looping bug again)
>>
>> http://ubuntuforums.org/archive/index.php/t-394744.html
>>
>> 8. AIX JFS2 filesystem corruption - due to a bug in fsck it
can''t fix the
>> fs, data had to be recovered from backup
>>
>>
>>
http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/error-518-file-system-corruption-366503
>>
>>
>> 9.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=514511
>> https://bugzilla.redhat.com/show_bug.cgi?id=477856
>>
>>
>> And there are many more...
You missed the fun vxfs ones where a full fs can corrupt itself so
badly that your only option is to restore from backup (fsck won''t help
you).  Then there was the vxfs memory leak on Solaris 10 (didn''t cause
corruption, but at some point you had to take outages to workaround
the problem).

Or the ''feature'' that was there for a long time, where unclean
shutdowns could (not always, but often enough to be annoying) mess up
vxvm so much that you had to run vxprivutil on your luns, send the
output to veritas, then they create a custom file for vxmake to repair
the private area just to be able to import the disk group.
>> The point again is that bugs happen even in fsck and until they are
fixed
>> a common user/sysadmin quote often won''t be able to recover on
its own. ZFS
>> is not exception here when it comes to bugs. But thanks to its
different
>> approach (mostly end-to-end checksumming + COW) its ability to detect
data
>> corruption and deal with it exceeds ?most ?generally available
solutions in
>> the market. The fixes for some bugs mentioned before make it only more
>> robust and reliable even for those unlucky users before... :)
And they even happen in ''mature'' and
''proven'' filesystems too...
>>
>>
>>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Orvar Korvar

2009-Nov-09 13:36 UTC

head link

[zfs-discuss] PSARC recover files?

This new PSARC putback that allows to rollback to an earlier valid uber block is
good.

This immediately raises a question: could we use this PSARC functionality to
recover deleted files? Or some variation? I dont need that functionality now,
but I am just curious...
-- 
This message posted from opensolaris.org

Tim Haley

2009-Nov-09 15:55 UTC

head link

[zfs-discuss] PSARC recover files?

Orvar Korvar wrote:> This new PSARC putback that allows to rollback to an earlier valid uber
block is good.
> 
> This immediately raises a question: could we use this PSARC functionality
to recover deleted files? Or some variation? I dont need that functionality now,
but I am just curious...
Not really.  Uberblocks are associated with transaction groups.  A transaction 
group can contain writes to many different files, and those writes
aren''t
necessarily everything that''s in a particular file.  You might luck out
and
restore a particular file in a rollback like this, but you''d probably
lose a
lot of other data at the same time.  We don''t offer the ability to
rollback if
the pool can be opened/imported successfully anyway.

-tim

Rob Logan

2009-Nov-09 16:43 UTC

head link

[zfs-discuss] PSARC recover files?

frequent snapshots offer outstanding "oops" protection.

			Rob

Ellis, Mike

2009-Nov-09 17:18 UTC

head link

[zfs-discuss] PSARC recover files?

Maybe to create snapshots "after the fact" as a part of some larger
disaster recovery effort.
(What did my pool/file-system look like at 10am?... Say 30-minutes before the
database barffed on itself...)

With some enhancements might this functionality be extendable into a "poor
man''s CDP" offering that won''t protect against
(non-redundant) hardware failures, but can provide some relieve in App/Human
creativity.

Seems like one of those things you never really need... Until you have to that
one time, at which point nothing else will do.

One would think that using zdb and friends it might be possible to "walk
the chain" of tx-logs backwards and each good/whole one could be a valid
recover/reset-point.

--

This raises a more fundamental question that perhaps someone can comment on.
Does ZFS''s COW follow a fairly strict last released-block, last
overwritten model (keeping a maximum buffer of "in tact" data), or do
previously used blocks get overwritten largely based on block/physical location,
fragmentation/best-fit, etc?). In cases of blank disks/LUNs, does for instance a
1TB drive get completely COW-ed onto its blank-space, or does zfs re-use
previously used (and freed) space before burning through then entire disk-space?

Thanks,

 -- MikeE


-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at
opensolaris.org] On Behalf Of Orvar Korvar
Sent: Monday, November 09, 2009 8:36 AM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] PSARC recover files?

This new PSARC putback that allows to rollback to an earlier valid uber block is
good.

This immediately raises a question: could we use this PSARC functionality to
recover deleted files? Or some variation? I dont need that functionality now,
but I am just curious...
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Rob Logan

2009-Nov-09 18:51 UTC

head link

[zfs-discuss] PSARC recover files?

> Maybe to create snapshots "after the fact"
how does one quiesce a drive "after the fact"?

Bryan Allen

2009-Nov-09 20:27 UTC

head link

[zfs-discuss] PSARC recover files?

+------------------------------------------------------------------------------
| On 2009-11-09 12:18:04, Ellis, Mike wrote:
| 
| Maybe to create snapshots "after the fact" as a part of some larger
disaster recovery effort.
| (What did my pool/file-system look like at 10am?... Say 30-minutes before the
database barffed on itself...)
| 
| With some enhancements might this functionality be extendable into a
"poor man''s CDP" offering that won''t protect against
(non-redundant) hardware failures, but can provide some relieve in App/Human
creativity.

Alternatively, you can write a cronjob/service that takes snapshots of your
important
filesystems. I take hourly snaps of our all our homedirs, and five-minute
snaps of our database volumes (InnoDB and Postgres both recover adequately; I
have used these snaps to build recovery zones to pull accidentally deleted data
from before; good times).

Look at OpenSolaris'' Time Slider service, although writing something
that does
this is pretty trivial (we use a Perl program with YAML configs launched by
cron every minute). My one suggestion would be to ensure the automatically
taken snaps have a unique name (@auto, or whatever), so you can do bulk expiry
tomorrow or next week without worry.

Cheers.
-- 
bda
cyberpunk is dead. long live cyberpunk.

Nigel Smith

2009-Nov-10 00:15 UTC

head link

[zfs-discuss] ZFS + fsck

On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote:> It would be nice to see this information at:
> http://hub.opensolaris.org/bin/view/Community+Group+on/126-130
> but it hasn''t changed since 23 October.
Well it seems we have an answer:

http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html

On Mon Nov 9 14:26:54 PST 2009, James C. McPherson
wrote:> The flag days page has not been updated since the switch
> to XWiki, it''s on my todo list but I don''t have an ETA
> for when it''ll be done.
Perhaps anyone interested in seeing the flags days page
resurrected can petition James to raise the priority on
his todo list.
Thanks
Nigel Smith
-- 
This message posted from opensolaris.org

James C. McPherson

2009-Nov-10 00:37 UTC

head link

[zfs-discuss] ZFS + fsck

Nigel Smith wrote:> On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote:
>> It would be nice to see this information at:
>> http://hub.opensolaris.org/bin/view/Community+Group+on/126-130
>> but it hasn''t changed since 23 October.
> 
> Well it seems we have an answer:
> 
> http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html
> 
> On Mon Nov 9 14:26:54 PST 2009, James C. McPherson wrote:
>> The flag days page has not been updated since the switch
>> to XWiki, it''s on my todo list but I don''t have an
ETA
>> for when it''ll be done.
> 
> Perhaps anyone interested in seeing the flags days page
> resurrected can petition James to raise the priority on
> his todo list.
Nigel,
*everybody* is interested in the flag days page. Including me.

Asking me to "raise the priority" is not helpful.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Nigel Smith

2009-Nov-10 13:52 UTC

head link

[zfs-discuss] ZFS + fsck

Hi James

James C. McPherson wrote:> *everybody* is interested in the flag days page. Including me.
> Asking me to "raise the priority" is not helpful.
>From my perspective, it''s a surprise that
''everybody'' is interested, as I''mnot seeing a lot of people complaining that the flag day page is not updating.
Only a couple of people on this list, and one of those is me!
Perhaps I''m looking in the wrong places.

I''m prepared to admit that I may well have misjudged the situation,
due to my lack of a full overview.

I''m sorry if my forum posts regarding this has not been helpful,
as my only intention was to try to be helpful.
Best Regards
Nigel Smith
-- 
This message posted from opensolaris.org

Joerg Moellenkamp

2009-Nov-10 18:15 UTC

head link

[zfs-discuss] ZFS + fsck

Hi, 
>> *everybody* is interested in the flag days page. Including me.
>> Asking me to "raise the priority" is not helpful.
> 
>> From my perspective, it''s a surprise that
''everybody'' is interested, as I''m
> not seeing a lot of people complaining that the flag day page is not
updating.
> Only a couple of people on this list, and one of those is me!
> Perhaps I''m looking in the wrong places.
I used this page frequently, too. But now i''m just using the twitter
account feeded by onnv-notify . You can look to it at 
http://twitter.com/codenews ....

Regards
 Joerg

BJ Quinn

2009-Nov-10 20:40 UTC

head link

[zfs-discuss] PSARC recover files?

Say I end up with a handful of unrecoverable bad blocks that just so happen to
be referenced by ALL of my snapshots (in some file that''s been around
forever).  Say I don''t care about the file or two in which the bad
blocks exist.  Is there any way to purge those blocks from the pool (and all
snapshots) without having to restore the whole pool from backup?
-- 
This message posted from opensolaris.org

Tim Cook

2009-Nov-10 21:04 UTC

head link

[zfs-discuss] PSARC recover files?

On Tue, Nov 10, 2009 at 2:40 PM, BJ Quinn <bjquinn at seidal.com> wrote:
> Say I end up with a handful of unrecoverable bad blocks that just so happen
> to be referenced by ALL of my snapshots (in some file that''s been
around
> forever).  Say I don''t care about the file or two in which the bad
blocks
> exist.  Is there any way to purge those blocks from the pool (and all
> snapshots) without having to restore the whole pool from backup?
>

No.  The whole point of a snapshot is to keep a consistent on-disk state
from a certain point in time.  I''m not entirely sure how you managed to
corrupt blocks that are part of an existing snapshot though, as they''d
be
read-only.  The only way that should even be able to happen is if you took a
snapshot after the blocks were already corrupted.  Any new writes would be
allocated from new blocks.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091110/bb951d8f/attachment.html>

A Darren Dunham

2009-Nov-10 21:19 UTC

head link

[zfs-discuss] PSARC recover files?

On Tue, Nov 10, 2009 at 03:04:24PM -0600, Tim Cook
wrote:> No.  The whole point of a snapshot is to keep a consistent on-disk state
> from a certain point in time.  I''m not entirely sure how you
managed to
> corrupt blocks that are part of an existing snapshot though, as
they''d be
> read-only.
Physical corruption of the media
Something outside of ZFS diddling bits on storage
> The only way that should even be able to happen is if you took a
> snapshot after the blocks were already corrupted.  Any new writes would be
> allocated from new blocks.
It can be corrupted while it sits on disk.  Since it''s read-only, you
can''t force it to allocate anything and clean itself up.

-- 
Darren

Tim Cook

2009-Nov-10 21:33 UTC

head link

[zfs-discuss] PSARC recover files?

On Tue, Nov 10, 2009 at 3:19 PM, A Darren Dunham <ddunham at taos.com>
wrote:
> On Tue, Nov 10, 2009 at 03:04:24PM -0600, Tim Cook wrote:
> > No.  The whole point of a snapshot is to keep a consistent on-disk
state
> > from a certain point in time.  I''m not entirely sure how you
managed to
> > corrupt blocks that are part of an existing snapshot though, as
they''d be
> > read-only.
>
> Physical corruption of the media
> Something outside of ZFS diddling bits on storage
>
> > The only way that should even be able to happen is if you took a
> > snapshot after the blocks were already corrupted.  Any new writes
would
> be
> > allocated from new blocks.
>
> It can be corrupted while it sits on disk.  Since it''s read-only,
you
> can''t force it to allocate anything and clean itself up.
>
>
You''re telling me a scrub won''t actively clean up corruption
in snapshots?
That sounds absolutely absurd to me.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091110/4eb8265f/attachment.html>

Nicolas Williams

2009-Nov-10 21:35 UTC

head link

[zfs-discuss] PSARC recover files?

On Tue, Nov 10, 2009 at 03:33:22PM -0600, Tim Cook
wrote:> You''re telling me a scrub won''t actively clean up
corruption in snapshots?
> That sounds absolutely absurd to me.
Depends on how much redundancy you have in your pool.  If you have no
mirrors, no RAID-Z, and no ditto blocks for data, well, you have no
redundancy, and ZFS won''t be able to recover affected files.

Nico
--

BJ Quinn

2009-Nov-11 00:06 UTC

head link

[zfs-discuss] PSARC recover files?

I believe it was physical corruption of the media.  Strange thing is last time
it happened to me it also managed to replicate the bad blocks over to my backup
server replicated with SNDR...

And yes, it IS read only, and a scrub will NOT actively clean up corruption in
snapshots.  It will DETECT corruption, but if it''s unrecoverable,
that''s that.  It''s unrecoverable.  If there''s not
enough redundancy in the pool, I''m ok with the data not being
recoverable.  But wouldn''t there be a way to purge out the bad blocks
if for example it was only in a single bad file out of millions of files, and I
didn''t care about the file in question?  I don''t want to
recover the file, I want to have a working version of my pool+snapshots minus
the tiny bit that was obviously corrupt.

Barring another solution, I''d have to take the pool in question, delete
the bad file, and delete ALL the snapshots.  Then restore the old snapshots from
backup to another pool, and copy over the current data from the pool that had a
problem over to the new pool.  I can get most of my snapshots back that way,
with the best known current data sitting on top as the active data set.  Problem
is with hundreds of snapshots plus compression, zfs send/recv takes over 24
hours to restore a full backup like that to a new storage device.  Last time
this happened to me, I just had to say goodbye to all my snapshots and deal with
it, all over a couple of kilobytes of temp files.
-- 
This message posted from opensolaris.org

zfs discuss - Nov 2009 - ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] ZFS + fsck

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?

[zfs-discuss] PSARC recover files?