thr3ads.net - zfs discuss - [zfs-discuss] Requesting a corrupted data block as-is [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Andrew

2005-Dec-13 19:02 UTC

[zfs-discuss] Requesting a corrupted data block as-is

If ZFS experiences a checksum error on a block that''s part of a file
being read by a user process, and the block is in an unreplicated pool, it logs
the error and signals an I/O error rather than return data to the process. Is
there a way for a process to request the corrupted block as-is? The block might
still be of some value, especially since the corruption is likely due to just a
single flipped bit, and for some data (e.g. audio and video data) a slightly
corrupted block is much better than a missing or zeroed block. Also for textual
data, especially text intended only for human consumption, a mostly-correct
block with a single flipped bit (and thus normally a single corrupted character)
is much better than losing many characters.
At the very least, there needs to be an administrative tool to update the block
checksums for a file to match the blocks as they actually exist on disk, so that
user processes can access the corrupted file.
This message posted from opensolaris.org

Eric Schrock

2005-Dec-13 20:49 UTC

head link

[zfs-discuss] Requesting a corrupted data block as-is

On Tue, Dec 13, 2005 at 11:02:10AM -0800, Andrew wrote:> If ZFS experiences a checksum error on a block that''s part of a
file
> being read by a user process, and the block is in an unreplicated
> pool, it logs the error and signals an I/O error rather than return
> data to the process. Is there a way for a process to request the
> corrupted block as-is? 
Not currently.  See:

6186106 ZFS needs a mechanism to read blocks with bad checksums

If you have any suggestions about what such a mechanism would look like,
please let us know.
> The block might still be of some value,
> especially since the corruption is likely due to just a single flipped
> bit.
This is not necessarily true.  See this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=3705
> and for some data (e.g. audio and video data) a slightly
> corrupted block is much better than a missing or zeroed block. Also
> for textual data, especially text intended only for human consumption,
> a mostly-correct block with a single flipped bit (and thus normally a
> single corrupted character) is much better than losing many
> characters.  At the very least, there needs to be an administrative
> tool to update the block checksums for a file to match the blocks as
> they actually exist on disk, so that user processes can access the
> corrupted file.
This is a pretty dangerous thing to do.  First of all, it would be
impossible for metadata - simply ''fixing'' the checksum would
most likely
result in panics and arbitrary corruption. It would be possible to do
this for data blocks, but as soon as you do, you lose any way to
identify these blocks after the fact.

I think we need to understand some real-world cases of corruption as
well as specific consumers that would benefit, before we decide that
this is a required feature.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Andrew

2005-Dec-14 00:35 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

> > Is there a way for a process
> to request the
> > corrupted block as-is? 
> 
> Not currently.  See:
> 
> 6186106 ZFS needs a mechanism to read blocks with
> bad checksumsGoogle search for "ZFS needs a mechanism to read blocks with bad
checksums" turns up nothing.
Google search for
6186106 ZFS
turns up nothing.
> > The block might still be of some value,
> > especially since the corruption is likely due to
> just a single flipped
> > bit.
> 
> This is not necessarily true.  See this thread:
> 
> http://www.opensolaris.org/jive/thread.jspa?threadID=3705I stand corrected.
However, for a single-block failure, ZFS is still guaranteed to excise the
entire glob of corrupted data and almost certainly some good data along with it,
and my comment still applies that for some applications (notably audio and
video) it''s better to get a block of partially-good data with some
random garbage mixed in than to get an entirely unreadable block.
> This is a pretty dangerous thing to do.  First of
> all, it would be
> impossible for metadata - simply ''fixing'' the
> checksum would most likely
> result in panics and arbitrary corruption.True. But even in this case, at least allowing the data to be manually read
(though not interpreted by the filesystem) could be helpful, because it could
allow manual partial recovery of directory contents.
> It would
> be possible to do
> this for data blocks, but as soon as you do, you
> lose any way to
> identify these blocks after the fact.Well, ZFS would have already informed the administrator about those blocks; at
that point, if he tells ZFS to rechecksum them, it''s his responsibility
to identify them after the fact. The most practical course of action would
probably be to request a full-pool search through all datasets to find all files
and volumes which reference the block, flag them in some way as being corrupted
and note the offending offset range, and then tell ZFS to rechecksum the block.
This message posted from opensolaris.org

Eric Schrock

2005-Dec-14 01:01 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

On Tue, Dec 13, 2005 at 04:35:19PM -0800, Andrew wrote:>
> Google search for "ZFS needs a mechanism to read blocks with bad
> checksums" turns up nothing.  Google search for 6186106 ZFS turns up
> nothing.
Sorry, this is a reference to a Solaris bug.  See the OpenSolaris bug
database for more info.

We understand your concerns, but some of the problems (such as metadata
reconstruction) are next to impossible to accomplish in any programmatic
way.  We can see the benefit of providing a means to access corrupted
blocks, hence the RFE.  But in the face of compression and arbitrarily
corrupted metadata, it''s simply not very high on our list.

As always, feel free to play around and prove us wrong.  libzpool/zdb
provides a safe way to read (but not write) pool data in userland so you
can experiment without panicking your system ;-)

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Richard Elling

2005-Dec-14 01:19 UTC

head link

[zfs-discuss] Re: Re: Requesting a corrupted data block as-is

> We understand your concerns, but some of the problems (such as metadata
> reconstruction) are next to impossible to accomplish in any programmatic
> way.  We can see the benefit of providing a means to access corrupted
> blocks, hence the RFE.  But in the face of compression and arbitrarily
> corrupted metadata, it''s simply not very high on our list.
Don''t underestimate this barrier.  For compressed or (future?)
encrypted data,
even a single bit flip has consequences far beyond the block in question.
If you allow this to be automatically propagated, then life gets real bad, real
quick for most folks.  It is better to flag the problem and let a human deal
with
it.
 -- richard
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2005-Dec-14 09:15 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

>We understand your concerns, but some of the problems (such as metadata
>reconstruction) are next to impossible to accomplish in any programmatic
>way.  We can see the benefit of providing a means to access corrupted
>blocks, hence the RFE.  But in the face of compression and arbitrarily
>corrupted metadata, it''s simply not very high on our list.
>
>As always, feel free to play around and prove us wrong.  libzpool/zdb
>provides a safe way to read (but not write) pool data in userland so you
>can experiment without panicking your system ;-)
If such corruption happens and you fix it (any which way); would it be
possible for ZFS to have leaked some blocks which are now unreachable?

Casper

Eric Schrock

2005-Dec-14 15:29 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

On Wed, Dec 14, 2005 at 10:15:09AM +0100, Casper.Dik at sun.com
wrote:> 
> If such corruption happens and you fix it (any which way); would it be
> possible for ZFS to have leaked some blocks which are now unreachable?
> 
Yes.  We have talked about various ways to handle this, but we haven''t
had any wonderful ideas yet.  Right now, we could just leak the blocks
forever, or do offline garbage collection of blocks (which is slow and
not very ZFS-like because it needs to be offline).

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Casper.Dik at Sun.COM

2005-Dec-14 16:22 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

>On Wed, Dec 14, 2005 at 10:15:09AM +0100, Casper.Dik at sun.com wrote:
>> 
>> If such corruption happens and you fix it (any which way); would it be
>> possible for ZFS to have leaked some blocks which are now unreachable?
>> 
>
>Yes.  We have talked about various ways to handle this, but we
haven''t
>had any wonderful ideas yet.  Right now, we could just leak the blocks
>forever, or do offline garbage collection of blocks (which is slow and
>not very ZFS-like because it needs to be offline).
Considering that this only happens on meta data corruption issues,
this should be a rar occurence anyway.

Casper

George Paplas

2005-Dec-19 01:08 UTC

head link

[zfs-discuss] Re: Requesting a corrupted data block as-is

--- Eric Schrock <eric.schrock at sun.com> wrote:
> On Tue, Dec 13, 2005 at 04:35:19PM -0800, Andrew wrote:
> >
> > Google search for "ZFS needs a mechanism to read blocks with bad
> > checksums" turns up nothing.  Google search for 6186106 ZFS turns
> up
> > nothing.
> 
> Sorry, this is a reference to a Solaris bug.  See the OpenSolaris bug
> database for more info.
> 
> We understand your concerns, but some of the problems (such as
> metadata
> reconstruction) are next to impossible to accomplish in any
> programmatic
> way.  We can see the benefit of providing a means to access corrupted
> blocks, hence the RFE.  But in the face of compression and
> arbitrarily
> corrupted metadata, it''s simply not very high on our list.
I guess that if the checksum of a block fails and there is no way to
heal it,
then you could try doing a brute-force bit flipping till the checksum
matches.

Probably you will need to limit the total number of bit-flips in order
to be
reasonably fast and safe of checksum collisions.

> 
> As always, feel free to play around and prove us wrong.  libzpool/zdb
> provides a safe way to read (but not write) pool data in userland so
> you
> can experiment without panicking your system ;-)
> 
> - Eric
> 
> --
> Eric Schrock, Solaris Kernel Development      
> http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Andrew

2005-Dec-19 16:20 UTC

head link

[zfs-discuss] Re: Re: Requesting a corrupted data block as-is

> I guess that if the checksum of a block fails and there is no way to
> heal it,
> then you could try doing a brute-force bit flipping till the checksum
> matches.In other words, invert the hash using the corrupted block as a hint.
> Probably you will need to limit the total number of bit-flips in order
> to be
> reasonably fast and safe of checksum collisions.Speed, rather than collisions, will be the limiting factor for the forseeable
future.
Brute-forcing for a 128kB block (the maximum ZFS size) would require a million
full-block hashes to test one flipped bit, and a trillion for two; the latter
would take months on a contemporary system.
Even for a 512-byte block (the minimum ZFS size), testing four flipped bits
would require hundreds of trillions of full-block hashes, which would take
months.
This message posted from opensolaris.org

zfs discuss - Dec 2005 - Requesting a corrupted data block as-is

[zfs-discuss] Requesting a corrupted data block as-is

[zfs-discuss] Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Requesting a corrupted data block as-is

[zfs-discuss] Re: Re: Requesting a corrupted data block as-is