thr3ads.net - zfs discuss - [zfs-discuss] ZFS receive checksum mismatch [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Jonathan Walker

2011-Jun-09 12:59 UTC

[zfs-discuss] ZFS receive checksum mismatch

Hey all,

New to ZFS, I made a critical error when migrating data and
configuring zpools according to needs - I stored a snapshot stream to
a file using "zfs send -R [filesystem]@[snapshot] >[stream_file]".
When I attempted to receive the stream onto to the newly configured
pool, I ended up with a checksum mismatch and thought I had lost my
data.

After googling the issue and finding nil, I downloaded FreeBSD
9-CURRENT (development), installed, and recompiled the kernel making
one modification to
"/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c":

Comment out the following lines (1439 - 1440 at the time of writing):

                        if (!ZIO_CHECKSUM_EQUAL(drre.drr_checksum, pcksum))
	                                ra.err = ECKSUM;

Once recompiled and booted up on the new kernel, I executed "zfs
receive -v [filesystem] <[stream_file]". Once received, I scrubbed the
zpool, which corrected a couple of checksum errors, and proceeded to
finish setting up my NAS. Hopefully, this might help someone else if
they''re stupid enough to make the same mistake I did...

Note: changing this section of the ZFS kernel code should not be used
for anything other than special cases when you need to bypass the data
integrity checks for recovery purposes.

-Johnny Walker

Paul Kraus

2011-Jun-09 14:52 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker <kallous at gmail.com>
wrote:
> New to ZFS, I made a critical error when migrating data and
> configuring zpools according to needs - I stored a snapshot stream to
> a file using "zfs send -R [filesystem]@[snapshot]
>[stream_file]".
    Why is this a critical error, I thought you were supposed to be
able to save the output from zfs send to a file (just as with tar or
ufsdump you can save the output to a file or a stream) ?
> When I attempted to receive the stream onto to the newly configured
> pool, I ended up with a checksum mismatch and thought I had lost my
> data.
    Was the cause of the checksum mismatch just that the stream data
was stored as a file ? That does not seem right to me.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Jonathan Walker

2011-Jun-09 17:02 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

>> New to ZFS, I made a critical error when migrating data and
>> configuring zpools according to needs - I stored a snapshot stream to
>> a file using "zfs send -R [filesystem]@[snapshot]
>[stream_file]".
>
>    Why is this a critical error, I thought you were supposed to be
>able to save the output from zfs send to a file (just as with tar or
>ufsdump you can save the output to a file or a stream) ?
Well yes, you can save the stream to a file, but it is intended for
immediate use with "zfs receive". Since the stream is not an image but
instead a serialization of objects, normal data recovery methods do not
apply in the event of corruption.
>> When I attempted to receive the stream onto to the newly configured
>> pool, I ended up with a checksum mismatch and thought I had lost my
>> data.
>
>    Was the cause of the checksum mismatch just that the stream data
>was stored as a file ? That does not seem right to me.
I really can''t say for sure what caused the corruption, but I think it
may have been related to a dying power supply. For more information,
check out:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storing_ZFS_Snapshot_Streams_.28zfs_send.2Freceive.29

Jim Klimov

2011-Jun-09 17:17 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

2011-06-09 18:52, Paul Kraus ?????:> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at gmail.com>
wrote:
>
>> New to ZFS, I made a critical error when migrating data and
>> configuring zpools according to needs - I stored a snapshot stream to
>> a file using "zfs send -R
[filesystem]@[snapshot]>[stream_file]".
>      Why is this a critical error, I thought you were supposed to be
> able to save the output from zfs send to a file (just as with tar or
> ufsdump you can save the output to a file or a stream) ?
>      Was the cause of the checksum mismatch just that the stream data
> was stored as a file ? That does not seem right to me.
>As recently mentioned on the list (regarding tape backups, I believe)
the zfs send stream format was not intended for long-term storage.
If some bits in the saved file flipped, the stream becomes invalid
regarding checksums and has to be resent. Besides, the format
is not public and subject to change, I think. So future compatibility
is not guaranteed.

Having said that, I have used dumping "zfs send" to files, rsyncing
them over a slow connection, and zfs recv''ing them on a another
machine - so this is known to work. However if it were to fail,
I could retry (and/or use rsync to correct some misreceived
blocks if network was faulty).




-- 


+============================================================+
|                                                            |
| ?????? ???????,                                 Jim Klimov |
| ??????????? ????????                                   CTO |
| ??? "??? ? ??"                                  JSC COS&HT |
|                                                            |
| +7-903-7705859 (cellular)          mailto:jimklimov at cos.ru |
|                          CC:admin at cos.ru,jimklimov at mail.ru |
+============================================================+
| ()  ascii ribbon campaign - against html mail              |
| /\                        - against microsoft attachments  |
+============================================================+

Paul Kraus

2011-Jun-09 17:33 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Thu, Jun 9, 2011 at 1:17 PM, Jim Klimov <jimklimov at cos.ru>
wrote:> 2011-06-09 18:52, Paul Kraus ?????:
>>
>> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at
gmail.com> ?wrote:
>>
>>> New to ZFS, I made a critical error when migrating data and
>>> configuring zpools according to needs - I stored a snapshot stream
to
>>> a file using "zfs send -R
[filesystem]@[snapshot]>[stream_file]".
>>
>> ? ? Why is this a critical error, I thought you were supposed to be
>> able to save the output from zfs send to a file (just as with tar or
>> ufsdump you can save the output to a file or a stream) ?
>> ? ? Was the cause of the checksum mismatch just that the stream data
>> was stored as a file ? That does not seem right to me.
>>
> As recently mentioned on the list (regarding tape backups, I believe)
> the zfs send stream format was not intended for long-term storage.
Only due to possible changes in the format.
> If some bits in the saved file flipped,
Then you have a bigger problem, namely that the file was corrupted.
That is not a limitation of the zfs send format. If the stream gets
corrupted via network transmission you have the same problem.
> the stream becomes invalid
> regarding checksums and has to be resent. Besides, the format
> is not public and subject to change, I think. So future compatibility
> is not guaranteed.
Recent documentation (the zfs man page) indicates that as of zpool/zfs
version 15/4 I think the stream format was committed and being able to
receive a stream from a given zfs dataset was supported on _newer_ zfs
versions.
> Having said that, I have used dumping "zfs send" to files,
rsyncing
> them over a slow connection, and zfs recv''ing them on a another
> machine - so this is known to work.
I suppose to move data or for an initial copy that makes sense, but
for long term replication why not just use incremental zfs sends ?
> However if it were to fail,
> I could retry (and/or use rsync to correct some misreceived
> blocks if network was faulty).
At some level we need to trust that the zfs send stream is intact.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Jim Klimov

2011-Jun-09 18:10 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

2011-06-09 21:33, Paul Kraus ?????:>> If some bits in the saved file flipped,
> Then you have a bigger problem, namely that the file was corrupted.
> That is not a limitation of the zfs send format. If the stream gets
> corrupted via network transmission you have the same problem.
No it is not quite a limitation, however the longer you store
a file and the huger it is, the more is the probability of
a single bit going wrong over time (i.e. on old tape stored
in a closet).

And ZFS is very picky about having detected a non-integrity
condition. Where other filesystems would feed you a broken
file, and perhaps some other layer of integrity would be
there to fix it, or you''d choose to ignore it, zfs will
refuse to process known-bad data.

As the original posted has shown, even within ZFS this problem
can be worked around... if ZFS would ask the admin what to do.
Kudos to him for that! ;)

And because of a small chunk you may lose everything ;)

I''ve had that part under a customer''s VMware ESX 3.0, which
did not honour cache flushes, so ZFS broke down upon hardware
resets (i.e. thermal failure) and paniced the kernel upon boot
attempts. Revertng that virtual Solaris server to use UFS was
sad - but it worke for years since then, even through such
mischiefs as hardware thermal resets.

I''ve tested that VM''s image recently with OI_151 dev LiveCD -
even it panics on that pool. It took aok=1 and zfs_recover=1
and "zfs import -o ro -f -F pool" to rollback those last bad
transactions.

BTW, "-F -n" was not honoured - the pool was imported and
the transactions were rolled back despite the message along
the lines "Would be able to recover to timestamp XXX"...

>> Having said that, I have used dumping "zfs send" to files,
rsyncing
>> them over a slow connection, and zfs recv''ing them on a
another
>> machine - so this is known to work.
> I suppose to move data or for an initial copy that makes sense, but
> for long term replication why not just use incremental zfs sends ?
>This was an initial copy (backing up a number of server setups
from a customer) with tens of Gbs to send over a flaky 1Mbit
link. Took many retries, and zfs send is not strong at retrying ;)

-- 


+============================================================+
|                                                            |
| ?????? ???????,                                 Jim Klimov |
| ??????????? ????????                                   CTO |
| ??? "??? ? ??"                                  JSC COS&HT |
|                                                            |
| +7-903-7705859 (cellular)          mailto:jimklimov at cos.ru |
|                          CC:admin at cos.ru,jimklimov at mail.ru |
+============================================================+
| ()  ascii ribbon campaign - against html mail              |
| /\                        - against microsoft attachments  |
+============================================================+

Toby Thain

2011-Jun-10 03:45 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On 09/06/11 1:33 PM, Paul Kraus wrote:> On Thu, Jun 9, 2011 at 1:17 PM, Jim Klimov <jimklimov at cos.ru>
wrote:
>> 2011-06-09 18:52, Paul Kraus ?????:
>>>
>>> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at
gmail.com>  wrote:
>>>
>>>> New to ZFS, I made a critical error when migrating data and
>>>> configuring zpools according to needs - I stored a snapshot
stream to
>>>> a file using "zfs send -R
[filesystem]@[snapshot]>[stream_file]".
>>>
>>>     Why is this a critical error, I thought you were supposed to be
>>> able to save the output from zfs send to a file (just as with tar
or
>>> ufsdump you can save the output to a file or a stream) ?
>>>     Was the cause of the checksum mismatch just that the stream
data
>>> was stored as a file ? That does not seem right to me.
>>>
>> As recently mentioned on the list (regarding tape backups, I believe)
>> the zfs send stream format was not intended for long-term storage.
> 
> Only due to possible changes in the format.
> 
>> If some bits in the saved file flipped,
> 
> Then you have a bigger problem, namely that the file was corrupted.
This fragility is one of the main reasons it has always been discouraged
(& regularly on this list) as an archive.

--Toby
> ...

Edward Ned Harvey

2011-Jun-10 11:40 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> Besides, the format
> is not public and subject to change, I think. So future compatibility
> is not guaranteed.
That is not correct.  

Years ago, there was a comment in the man page that said this:  "The format
of the stream is evolving. No backwards  compatibility is guaranteed. You
may not be able to receive your streams on future versions of ZFS."

But in the last several years, backward/forward compatibility has always
been preserved, so despite the warning, it was never a problem.

In more recent versions, the man page says:  "The format of the stream is
committed. You will be able to receive your streams on future versions of
ZFS."

Edward Ned Harvey

2011-Jun-10 11:47 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jonathan Walker
> 
> New to ZFS, I made a critical error when migrating data and
> configuring zpools according to needs - I stored a snapshot stream to
> a file using "zfs send -R [filesystem]@[snapshot]
>[stream_file]".
There are precisely two reasons why it''s not recommended to store a zfs
send
datastream for later use.  As long as you can acknowledge and accept these
limitations, then sure, go right ahead and store it.  ;-)  A lot of people
do, and it''s good.

#1  A single bit error causes checksum mismatch and then the whole data
stream is not receivable.  Obviously you encountered this problem already,
and you were able to work around.  If I were you, however, I would be
skeptical about data integrity on your system.  You said you scrubbed and
corrected a couple of errors, but that''s not actually possible.  The
filesystem integrity checksums are for detection, not correction, of
corruption.  The only way corruption gets corrected is when there''s a
redundant copy of the data...  Then ZFS can discard the corrupt copy,
overwrite with a good copy, and all the checksums suddenly match.  Of course
there is no such thing in the zfs send data stream - no redundant copy in
the data stream.  So yes, you have corruption.  The best you can possibly do
is to identify where it is, and then remove the affected files.

#2  You cannot do a partial receive, nor generate a catalog of the files
within the datastream.  You can restore the whole filesystem or nothing.

Darren J Moffat

2011-Jun-10 11:58 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On 06/10/11 12:47, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Jonathan Walker
>>
>> New to ZFS, I made a critical error when migrating data and
>> configuring zpools according to needs - I stored a snapshot stream to
>> a file using "zfs send -R
[filesystem]@[snapshot]>[stream_file]".
>
> There are precisely two reasons why it''s not recommended to store
a zfs send
> datastream for later use.  As long as you can acknowledge and accept these
> limitations, then sure, go right ahead and store it.  ;-)  A lot of people
> do, and it''s good.
Not recommended by who ?  Which documentation says this ?

As I pointed out last time this came up the NDMP service on Solaris 11 
Express and on the Oracle ZFS Storage Appliance uses the ''zfs
send''
stream as what is to be stored on the "tape".

-- 
Darren J Moffat

Jim Klimov

2011-Jun-10 12:59 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

2011-06-10 15:58, Darren J Moffat ?????:>
> As I pointed out last time this came up the NDMP service on Solaris 11 
> Express and on the Oracle ZFS Storage Appliance uses the ''zfs
send''
> stream as what is to be stored on the "tape".
>
This discussion turns interesting ;)

Just curious: how do these products work around the stream fragility
which we are discussing here - that a single-bit error can/will/should
make the whole zfs send stream invalid, even though it is probably
an error localized in a single block. This block is ultimately related
to a file (or a few files in case of dedup or snapshots/clones) whose
name "zfs recv" could report for an admin to take action such as
rsync.

If it is true that unlike ZFS itself, the replication stream format has
no redundancy (even of ECC/CRC sort), how can it be used for
long-term retention "on tape"?

I understand about online transfers, somewhat. If the transfer failed,
you still have the original to retry. But backups are often needed when
the original is no longer alive, and that''s why they are needed ;)

And by Murphy''s law that''s when this single bit strikes ;)

Is such "tape" storage only intended for reliable media such as
another ZFS or triple-redundancy tape archive with fancy robotics?
How would it cope with BER in transfers to/from such media?

Also, an argument was recently posed (when I wrote of saving
zfs send streams into files and transferring them by rsync over
slow bad links), that for most online transfers I should better use
zfs send of incremental snapshots. While I agree with this in terms
that an incremental transfer is presumably smaller and has less
chance of corruption (network failure) during transfer than a huge
initial stream, this chance of corruption is still non-zero. Simply
in case of online transfers I can detect the error and retry at low
cost (or big cost - bandwidth is not free in many parts of the world).

Going back to storing many streams (initial + increments) on tape -
if an intermediate incremental stream has a single-bit error, then
its snapshot and any which follow-up can not be received into zfs.
Even if the "broken" block is later freed and discarded (equivalent
to overwriting with a newer version of a file from a newer increment
in classic backup systems with a file being the unit of backup).

And since the total size of initial+incremental backups is likely
larger than of a single full dump, the chance of a single corruption
making your (latest) backup useless would be also higher, right?

Thanks for clarifications,
//Jim Klimov

David Magda

2011-Jun-10 15:59 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Fri, June 10, 2011 07:47, Edward Ned Harvey wrote:
> #1  A single bit error causes checksum mismatch and then the whole data
> stream is not receivable.
I wonder if it would be worth adding a (toggleable?) forward error
correction (FEC) [1] scheme to the ''zfs send'' stream.

Even if we''re talking about a straight zfs send/recv pipe, and not
saving
to a file, it''d be handy as you wouldn''t have restart a large
transfer for
a single bit error (especially for those long initial syncs of remote
''mirrors'').

[1] http://en.wikipedia.org/wiki/Forward_error_correction

Marty Scholes

2011-Jun-10 16:58 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> If it is true that unlike ZFS itself, the replication
> stream format has
> no redundancy (even of ECC/CRC sort), how can it be
> used for
> long-term retention "on tape"?
It can''t.  I don''t think it has been documented anywhere, but
I believe that it has been well understood that if you don''t trust your
storage (tape, disk, floppies, punched cards, whatever), then you
shouldn''t trust your incremental streams on that storage.

It''s as if the ZFS design assumed that all incremental streams would be
either perfect or retryable.

This is a huge problem for tape retention, not so much for disk retention.

On a personal level I have handled this with a separate pool of fewer, larger
and slower drives which serves solely as backup, taking incremental streams from
the main pool every 20 minutes or so.

Unfortunately that approach breaks the legacy backup strategy of pretty much
every company.

I think the message is that unless you can ensure the integrity of the stream,
either backups should go to another pool or zfs send/receive should not be a
critical part of the backup strategy.
-- 
This message posted from opensolaris.org

Jim Klimov

2011-Jun-10 17:16 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

2011-06-10 20:58, Marty Scholes ?????:>> If it is true that unlike ZFS itself, the replication
>> stream format has
>> no redundancy (even of ECC/CRC sort), how can it be
>> used for
>> long-term retention "on tape"?
> It can''t.  I don''t think it has been documented anywhere,
but I believe that it has been well understood that if you don''t trust
your storage (tape, disk, floppies, punched cards, whatever), then you
shouldn''t trust your incremental streams on that storage.
Well, the whole point of this redundancy in ZFS is about not trusting
any storage (maybe including RAM at some time - but so far it is
requested to be ECC RAM) ;)

Hell, we don''t ultimately trust any storage...
Oops, I forgot what I wanted to say next ;)
> It''s as if the ZFS design assumed that all incremental streams
would be either perfect or retryable.
>Yup. Seems like another ivory-tower assumption ;)
> This is a huge problem for tape retention, not so much for disk retention.
Because why? You can make mirrors or raidz of disks?
> On a personal level I have handled this with a separate pool of fewer,
larger and slower drives which serves solely as backup, taking incremental
streams from the main pool every 20 minutes or so.
>
> Unfortunately that approach breaks the legacy backup strategy of pretty
much every company.
I''m afraid it also breaks backups of petabyte-sized arrays where
it is impractical to double or triple the number of racks with spinning
drives, but is practical to have a closet full of tapes for the automated
robot-feeded ;)

> I think the message is that unless you can ensure the integrity of the
stream, either backups should go to another pool or zfs send/receive should not
be a critical part of the backup strategy.
Or that zfs streams can be improved to VALIDLY become part
of such strategy.

Regarding the checksums in ZFS, as of now I guess we
can send the ZFS streams to a file, compress this file
with ZIP, RAR or some other format with CRC and some
added "recoverability" (i.e. WinRAR claims to be able
to repair about 1% of erroneous file data with standard
settings) and send these ZIP/RAR archives to the tape.

Obviously, a standard integrated solution within ZFS
would be better and more portable.

See FEC suggestion from another poster ;)


//Jim

Paul Kraus

2011-Jun-10 17:47 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Fri, Jun 10, 2011 at 8:59 AM, Jim Klimov <jimklimov at cos.ru> wrote:
> Is such "tape" storage only intended for reliable media such as
> another ZFS or triple-redundancy tape archive with fancy robotics?
> How would it cope with BER in transfers to/from such media?
    Large and small businesses have been using TAPE as a BACKUP media
for decades. One of the cardinal rules is that you MUST have at least
TWO FULL copies if you expect to be able to use them. An Incremental
backup is marginally better than an incremental zfs send in that you
_can_ recover the files contained in the backup image.

    I understand why a zfs send is what it is (and you can''t pull
individual files out of it), and that it must be bit for bit correct,
and that IF it is large, then the chances of a bit error are higher.
But given all that, I still have not heard a good reason NOT to keep
zfs send stream images around as insurance. Yes, they must not be
corrupt (that is true for ANY backup storage), and if they do get
corrupted you cannot (without tweeks that may jeopardize the data
integrity) "restore" that stream image. But this really is not a
higher bar than for any other "backup" system. This is why I wondered
at the original posters comment that he had made a critical mistake
(unless the mistake was using storage for the image that had a high
chance of corruption and did not have a second copy of the image).

    Sorry if that has been discussed here before, how much of this
list I get to read depends on how busy I am. Right now I am very busy
moving 20 TB of data from one configuration of 14 zpools to a
configuration of one zpool (and only one dataset, no zfs send / recv
for me), so I have lots of time to wait, and I spend some of that time
reading this list :-)

P.S. This data is "backed up", both the old and new configuration via
regular zfs snapshots (for day to day needs) and zfs send / recv
replication to a remote site (for DR needs). The initial zfs full send
occurred when the new zpool was new and empty, so I only have to push
the incrementals through the WAN link.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Marty Scholes

2011-Jun-10 18:49 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> I stored a snapshot stream to a file
The tragic irony here is that the file was stored on a non-zfs filesystem.  You
had had undetected bitrot which unknowingly corrupted the stream.  Other files
also might have been silently corrupted as well.

You may have just made one of the strongest cases yet for zfs and its
assurances.
-- 
This message posted from opensolaris.org

Richard Elling

2011-Jun-10 21:33 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Jun 10, 2011, at 8:59 AM, David Magda wrote:
> On Fri, June 10, 2011 07:47, Edward Ned Harvey wrote:
> 
>> #1  A single bit error causes checksum mismatch and then the whole data
>> stream is not receivable.
> 
> I wonder if it would be worth adding a (toggleable?) forward error
> correction (FEC) [1] scheme to the ''zfs send'' stream.
pipes are your friend!
 -- richard

Edward Ned Harvey

2011-Jun-11 12:46 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> See FEC suggestion from another poster ;)
Well, of course, all storage mediums have built-in hardware FEC.  At least disk
& tape for sure.  But naturally you can''t always trust it
blindly...

If you simply want to layer on some more FEC, there must be some standard
generic FEC utilities out there, right?
	zfs send | fec > /dev/...
Of course this will inflate the size of the data stream somewhat, but improves
the reliability...

But finally - If you think of a disk as one large sequential storage device, and
a zfs send stream is just another large sequential data stream...  And we take
it for granted that a single bit error inside a ZFS filesystem doesn''t
corrupt the whole filesystem but just a localized file or object... 
It''s all because the filesystem is broken down into a bunch of smaller
blocks and each individual block has its own checksum.  Shouldn''t it be
relatively trivial for the zfs send datastream to do its checksums on smaller
chunks of data instead of just a single checksum for the whole set?

David Magda

2011-Jun-11 13:03 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Jun 11, 2011, at 08:46, Edward Ned Harvey wrote:
> If you simply want to layer on some more FEC, there must be some standard
generic FEC utilities out there, right?
> 	zfs send | fec > /dev/...
> Of course this will inflate the size of the data stream somewhat, but
improves the reliability...
If one is saving streams to a disk, it pay be worth creating parity files for
them (especially if the destination file system is not ZFS):

http://en.wikipedia.org/wiki/Parity_file
http://en.wikipedia.org/wiki/Parchive

Edward Ned Harvey

2011-Jun-11 13:20 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> From: David Magda [mailto:dmagda at ee.ryerson.ca]
> Sent: Saturday, June 11, 2011 9:04 AM
> 
> If one is saving streams to a disk, it pay be worth creating parity files
for them> (especially if the destination file system is not ZFS):
Parity is just a really simple form of error detection.  It''s not very
useful for error correction.  If you look into error correction codes,
you''ll see there are many other codes which would be more useful for
the
purposes of zfs send datastream integrity on long-term storage.

David Magda

2011-Jun-11 13:38 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Jun 11, 2011, at 09:20, Edward Ned Harvey wrote:
> Parity is just a really simple form of error detection.  It''s not
very
> useful for error correction.  If you look into error correction codes,
> you''ll see there are many other codes which would be more useful
for the
> purposes of zfs send datastream integrity on long-term storage.
> These parity files use a forward error correction-style system that can be
used to perform data verification, and allow recovery when data is lost or
corrupted.
http://en.wikipedia.org/wiki/Parchive
> Because this new approach doesn''t benefit from like sized files,
it drastically extends the potiental applications of PAR. Files such as video,
music, and other data can remain in a usable format and still have recovery data
associated with them.
> 
> The technology is based on a ''Reed-Solomon Code''
implementation that allows for recovery of any ''X'' real
data-blocks for ''X'' parity data-blocks present.
http://parchive.sourceforge.net/

Jim Klimov

2011-Jun-11 14:26 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

2011-06-11 17:20, Edward Ned Harvey ?????:>> From: David Magda [mailto:dmagda at ee.ryerson.ca]
>> Sent: Saturday, June 11, 2011 9:04 AM
>>
>> If one is saving streams to a disk, it pay be worth creating parity
files
> for them
>> (especially if the destination file system is not ZFS):
> Parity is just a really simple form of error detection.  It''s not
very
> useful for error correction.  If you look into error correction codes,
> you''ll see there are many other codes which would be more useful
for the
> purposes of zfs send datastream integrity on long-term storage.
Well, parity lets you reconstruct the original data, if you can
decide which pieces to trust, no? Either there are many fitting
pieces and few misfitting pieces (raidzN), or you have
checksums so you know which copy is correct, if any.

Or like some RAID implementations, you just trust the copy
on a device which has not shown any errors (yet) ;)

But, wait... if you have checksums to trust one of the two
copies - it''s easier to make an option to embed mirroring
into the "zfs send" stream?

For example, interlace chunks of same data with
checksums sized, say, 64Mb by default (not too heavy
on cache, but big enough to unlikely be broken by the
same external problem like a surface scratch; further
configurable by sending user). Sample layout:
AaA''a''BbB''b''...

Where "A" and "A''" are copies of the same data, and
"a" and "a''" are their checksums,
"B"''s are the next
set of chunks, etc.

PS: Do I understand correctly that inside a ZFS send
stream there are no longer original variably sized blocks
from the sending system, so the receiver can reconstruct
blocks on its disk according to dedup, compression,
and maybe larger coalesced block sizes for files originally
written in small portions, etc?

If coalescing-on-write is indeed done, how does it play
well with snapshots? I.e. if original file was represented
in snapshot#1 by a few small blocks, received in snapshot#1''
as one big block, but later part of the file was changed.
The source snapshot#2 includes only the changed small
blocks, what would the receiving snapshot#2'' do?
Or there is no coalescing and this is why? ;)

Thanks,
//Jim Klimov

Edward Ned Harvey

2011-Jun-11 14:37 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

> From: David Magda [mailto:dmagda at ee.ryerson.ca]
> Sent: Saturday, June 11, 2011 9:38 AM
> 
> These parity files use a forward error correction-style system that can be
> used to perform data verification, and allow recovery when data is lost or
> corrupted.
> 
> http://en.wikipedia.org/wiki/Parchive
Well spotted.  But par2 seems to be intended exclusively for use on files,
not data streams.  From a file (or files) create some par2 files...

Anyone know of a utility that allows you to layer fec code into a data
stream, suitable for piping?

David Magda

2011-Jun-12 03:31 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Jun 11, 2011, at 10:37, Edward Ned Harvey wrote:
>> From: David Magda [mailto:dmagda at ee.ryerson.ca]
>> Sent: Saturday, June 11, 2011 9:38 AM
>> 
>> These parity files use a forward error correction-style system that can
be
>> used to perform data verification, and allow recovery when data is lost
or
>> corrupted.
>> 
>> http://en.wikipedia.org/wiki/Parchive
> 
> Well spotted.  But par2 seems to be intended exclusively for use on files,
> not data streams.  From a file (or files) create some par2 files...
> 
> Anyone know of a utility that allows you to layer fec code into a data
> stream, suitable for piping?
Yes; I was thinking more of the stream-on-disk use case.

A FEC pipe might be a nice undregrad software project for someone. Perhaps by
default multiplex the data and the FEC, and then on the other end do one of two
things: de-multiplex things into the next part of the pipe, or split the FEC
stream into the original to one file and the original data into another.

Richard Elling

2011-Jun-12 19:13 UTC

head link

[zfs-discuss] ZFS receive checksum mismatch

On Jun 11, 2011, at 5:46 AM, Edward Ned Harvey wrote:
>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Jim Klimov
>> 
>> See FEC suggestion from another poster ;)
> 
> Well, of course, all storage mediums have built-in hardware FEC.  At least
disk & tape for sure.  But naturally you can''t always trust it
blindly...
> 
> If you simply want to layer on some more FEC, there must be some standard
generic FEC utilities out there, right?
> 	zfs send | fec > /dev/...
> Of course this will inflate the size of the data stream somewhat, but
improves the reliability...
The problem is that many FEC algorithms are good at correcting a few bits. For
example, disk
drives tend to correct somewhere on the order of 8 bytes per block. Tapes can
correct more bytes
per block. I''ve collected a large number of error reports showing the
bitwise analysis of data
corruption we''ve seen in ZFS and there is only one case where a stuck
bit was detected. Most of
the corruptions I see are multiple bytes and many are zero-filled.

In other words, if you are expecting to use FEC and FEC only corrects a few
bits, you might be
disappointed.
 -- richard

zfs discuss - Jun 2011 - ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch

[zfs-discuss] ZFS receive checksum mismatch