Hey all, New to ZFS, I made a critical error when migrating data and configuring zpools according to needs - I stored a snapshot stream to a file using "zfs send -R [filesystem]@[snapshot] >[stream_file]". When I attempted to receive the stream onto to the newly configured pool, I ended up with a checksum mismatch and thought I had lost my data. After googling the issue and finding nil, I downloaded FreeBSD 9-CURRENT (development), installed, and recompiled the kernel making one modification to "/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c": Comment out the following lines (1439 - 1440 at the time of writing): if (!ZIO_CHECKSUM_EQUAL(drre.drr_checksum, pcksum)) ra.err = ECKSUM; Once recompiled and booted up on the new kernel, I executed "zfs receive -v [filesystem] <[stream_file]". Once received, I scrubbed the zpool, which corrected a couple of checksum errors, and proceeded to finish setting up my NAS. Hopefully, this might help someone else if they''re stupid enough to make the same mistake I did... Note: changing this section of the ZFS kernel code should not be used for anything other than special cases when you need to bypass the data integrity checks for recovery purposes. -Johnny Walker
On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker <kallous at gmail.com> wrote:> New to ZFS, I made a critical error when migrating data and > configuring zpools according to needs - I stored a snapshot stream to > a file using "zfs send -R [filesystem]@[snapshot] >[stream_file]".Why is this a critical error, I thought you were supposed to be able to save the output from zfs send to a file (just as with tar or ufsdump you can save the output to a file or a stream) ?> When I attempted to receive the stream onto to the newly configured > pool, I ended up with a checksum mismatch and thought I had lost my > data.Was the cause of the checksum mismatch just that the stream data was stored as a file ? That does not seem right to me. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
>> New to ZFS, I made a critical error when migrating data and >> configuring zpools according to needs - I stored a snapshot stream to >> a file using "zfs send -R [filesystem]@[snapshot] >[stream_file]". > > Why is this a critical error, I thought you were supposed to be >able to save the output from zfs send to a file (just as with tar or >ufsdump you can save the output to a file or a stream) ?Well yes, you can save the stream to a file, but it is intended for immediate use with "zfs receive". Since the stream is not an image but instead a serialization of objects, normal data recovery methods do not apply in the event of corruption.>> When I attempted to receive the stream onto to the newly configured >> pool, I ended up with a checksum mismatch and thought I had lost my >> data. > > Was the cause of the checksum mismatch just that the stream data >was stored as a file ? That does not seem right to me.I really can''t say for sure what caused the corruption, but I think it may have been related to a dying power supply. For more information, check out: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storing_ZFS_Snapshot_Streams_.28zfs_send.2Freceive.29
2011-06-09 18:52, Paul Kraus ?????:> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at gmail.com> wrote: > >> New to ZFS, I made a critical error when migrating data and >> configuring zpools according to needs - I stored a snapshot stream to >> a file using "zfs send -R [filesystem]@[snapshot]>[stream_file]". > Why is this a critical error, I thought you were supposed to be > able to save the output from zfs send to a file (just as with tar or > ufsdump you can save the output to a file or a stream) ? > Was the cause of the checksum mismatch just that the stream data > was stored as a file ? That does not seem right to me. >As recently mentioned on the list (regarding tape backups, I believe) the zfs send stream format was not intended for long-term storage. If some bits in the saved file flipped, the stream becomes invalid regarding checksums and has to be resent. Besides, the format is not public and subject to change, I think. So future compatibility is not guaranteed. Having said that, I have used dumping "zfs send" to files, rsyncing them over a slow connection, and zfs recv''ing them on a another machine - so this is known to work. However if it were to fail, I could retry (and/or use rsync to correct some misreceived blocks if network was faulty). -- +============================================================+ | | | ?????? ???????, Jim Klimov | | ??????????? ???????? CTO | | ??? "??? ? ??" JSC COS&HT | | | | +7-903-7705859 (cellular) mailto:jimklimov at cos.ru | | CC:admin at cos.ru,jimklimov at mail.ru | +============================================================+ | () ascii ribbon campaign - against html mail | | /\ - against microsoft attachments | +============================================================+
On Thu, Jun 9, 2011 at 1:17 PM, Jim Klimov <jimklimov at cos.ru> wrote:> 2011-06-09 18:52, Paul Kraus ?????: >> >> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at gmail.com> ?wrote: >> >>> New to ZFS, I made a critical error when migrating data and >>> configuring zpools according to needs - I stored a snapshot stream to >>> a file using "zfs send -R [filesystem]@[snapshot]>[stream_file]". >> >> ? ? Why is this a critical error, I thought you were supposed to be >> able to save the output from zfs send to a file (just as with tar or >> ufsdump you can save the output to a file or a stream) ? >> ? ? Was the cause of the checksum mismatch just that the stream data >> was stored as a file ? That does not seem right to me. >> > As recently mentioned on the list (regarding tape backups, I believe) > the zfs send stream format was not intended for long-term storage.Only due to possible changes in the format.> If some bits in the saved file flipped,Then you have a bigger problem, namely that the file was corrupted. That is not a limitation of the zfs send format. If the stream gets corrupted via network transmission you have the same problem.> the stream becomes invalid > regarding checksums and has to be resent. Besides, the format > is not public and subject to change, I think. So future compatibility > is not guaranteed.Recent documentation (the zfs man page) indicates that as of zpool/zfs version 15/4 I think the stream format was committed and being able to receive a stream from a given zfs dataset was supported on _newer_ zfs versions.> Having said that, I have used dumping "zfs send" to files, rsyncing > them over a slow connection, and zfs recv''ing them on a another > machine - so this is known to work.I suppose to move data or for an initial copy that makes sense, but for long term replication why not just use incremental zfs sends ?> However if it were to fail, > I could retry (and/or use rsync to correct some misreceived > blocks if network was faulty).At some level we need to trust that the zfs send stream is intact. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
2011-06-09 21:33, Paul Kraus ?????:>> If some bits in the saved file flipped, > Then you have a bigger problem, namely that the file was corrupted. > That is not a limitation of the zfs send format. If the stream gets > corrupted via network transmission you have the same problem.No it is not quite a limitation, however the longer you store a file and the huger it is, the more is the probability of a single bit going wrong over time (i.e. on old tape stored in a closet). And ZFS is very picky about having detected a non-integrity condition. Where other filesystems would feed you a broken file, and perhaps some other layer of integrity would be there to fix it, or you''d choose to ignore it, zfs will refuse to process known-bad data. As the original posted has shown, even within ZFS this problem can be worked around... if ZFS would ask the admin what to do. Kudos to him for that! ;) And because of a small chunk you may lose everything ;) I''ve had that part under a customer''s VMware ESX 3.0, which did not honour cache flushes, so ZFS broke down upon hardware resets (i.e. thermal failure) and paniced the kernel upon boot attempts. Revertng that virtual Solaris server to use UFS was sad - but it worke for years since then, even through such mischiefs as hardware thermal resets. I''ve tested that VM''s image recently with OI_151 dev LiveCD - even it panics on that pool. It took aok=1 and zfs_recover=1 and "zfs import -o ro -f -F pool" to rollback those last bad transactions. BTW, "-F -n" was not honoured - the pool was imported and the transactions were rolled back despite the message along the lines "Would be able to recover to timestamp XXX"...>> Having said that, I have used dumping "zfs send" to files, rsyncing >> them over a slow connection, and zfs recv''ing them on a another >> machine - so this is known to work. > I suppose to move data or for an initial copy that makes sense, but > for long term replication why not just use incremental zfs sends ? >This was an initial copy (backing up a number of server setups from a customer) with tens of Gbs to send over a flaky 1Mbit link. Took many retries, and zfs send is not strong at retrying ;) -- +============================================================+ | | | ?????? ???????, Jim Klimov | | ??????????? ???????? CTO | | ??? "??? ? ??" JSC COS&HT | | | | +7-903-7705859 (cellular) mailto:jimklimov at cos.ru | | CC:admin at cos.ru,jimklimov at mail.ru | +============================================================+ | () ascii ribbon campaign - against html mail | | /\ - against microsoft attachments | +============================================================+
On 09/06/11 1:33 PM, Paul Kraus wrote:> On Thu, Jun 9, 2011 at 1:17 PM, Jim Klimov <jimklimov at cos.ru> wrote: >> 2011-06-09 18:52, Paul Kraus ?????: >>> >>> On Thu, Jun 9, 2011 at 8:59 AM, Jonathan Walker<kallous at gmail.com> wrote: >>> >>>> New to ZFS, I made a critical error when migrating data and >>>> configuring zpools according to needs - I stored a snapshot stream to >>>> a file using "zfs send -R [filesystem]@[snapshot]>[stream_file]". >>> >>> Why is this a critical error, I thought you were supposed to be >>> able to save the output from zfs send to a file (just as with tar or >>> ufsdump you can save the output to a file or a stream) ? >>> Was the cause of the checksum mismatch just that the stream data >>> was stored as a file ? That does not seem right to me. >>> >> As recently mentioned on the list (regarding tape backups, I believe) >> the zfs send stream format was not intended for long-term storage. > > Only due to possible changes in the format. > >> If some bits in the saved file flipped, > > Then you have a bigger problem, namely that the file was corrupted.This fragility is one of the main reasons it has always been discouraged (& regularly on this list) as an archive. --Toby> ...
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jim Klimov > > Besides, the format > is not public and subject to change, I think. So future compatibility > is not guaranteed.That is not correct. Years ago, there was a comment in the man page that said this: "The format of the stream is evolving. No backwards compatibility is guaranteed. You may not be able to receive your streams on future versions of ZFS." But in the last several years, backward/forward compatibility has always been preserved, so despite the warning, it was never a problem. In more recent versions, the man page says: "The format of the stream is committed. You will be able to receive your streams on future versions of ZFS."
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jonathan Walker > > New to ZFS, I made a critical error when migrating data and > configuring zpools according to needs - I stored a snapshot stream to > a file using "zfs send -R [filesystem]@[snapshot] >[stream_file]".There are precisely two reasons why it''s not recommended to store a zfs send datastream for later use. As long as you can acknowledge and accept these limitations, then sure, go right ahead and store it. ;-) A lot of people do, and it''s good. #1 A single bit error causes checksum mismatch and then the whole data stream is not receivable. Obviously you encountered this problem already, and you were able to work around. If I were you, however, I would be skeptical about data integrity on your system. You said you scrubbed and corrected a couple of errors, but that''s not actually possible. The filesystem integrity checksums are for detection, not correction, of corruption. The only way corruption gets corrected is when there''s a redundant copy of the data... Then ZFS can discard the corrupt copy, overwrite with a good copy, and all the checksums suddenly match. Of course there is no such thing in the zfs send data stream - no redundant copy in the data stream. So yes, you have corruption. The best you can possibly do is to identify where it is, and then remove the affected files. #2 You cannot do a partial receive, nor generate a catalog of the files within the datastream. You can restore the whole filesystem or nothing.
On 06/10/11 12:47, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Jonathan Walker >> >> New to ZFS, I made a critical error when migrating data and >> configuring zpools according to needs - I stored a snapshot stream to >> a file using "zfs send -R [filesystem]@[snapshot]>[stream_file]". > > There are precisely two reasons why it''s not recommended to store a zfs send > datastream for later use. As long as you can acknowledge and accept these > limitations, then sure, go right ahead and store it. ;-) A lot of people > do, and it''s good.Not recommended by who ? Which documentation says this ? As I pointed out last time this came up the NDMP service on Solaris 11 Express and on the Oracle ZFS Storage Appliance uses the ''zfs send'' stream as what is to be stored on the "tape". -- Darren J Moffat
2011-06-10 15:58, Darren J Moffat ?????:> > As I pointed out last time this came up the NDMP service on Solaris 11 > Express and on the Oracle ZFS Storage Appliance uses the ''zfs send'' > stream as what is to be stored on the "tape". >This discussion turns interesting ;) Just curious: how do these products work around the stream fragility which we are discussing here - that a single-bit error can/will/should make the whole zfs send stream invalid, even though it is probably an error localized in a single block. This block is ultimately related to a file (or a few files in case of dedup or snapshots/clones) whose name "zfs recv" could report for an admin to take action such as rsync. If it is true that unlike ZFS itself, the replication stream format has no redundancy (even of ECC/CRC sort), how can it be used for long-term retention "on tape"? I understand about online transfers, somewhat. If the transfer failed, you still have the original to retry. But backups are often needed when the original is no longer alive, and that''s why they are needed ;) And by Murphy''s law that''s when this single bit strikes ;) Is such "tape" storage only intended for reliable media such as another ZFS or triple-redundancy tape archive with fancy robotics? How would it cope with BER in transfers to/from such media? Also, an argument was recently posed (when I wrote of saving zfs send streams into files and transferring them by rsync over slow bad links), that for most online transfers I should better use zfs send of incremental snapshots. While I agree with this in terms that an incremental transfer is presumably smaller and has less chance of corruption (network failure) during transfer than a huge initial stream, this chance of corruption is still non-zero. Simply in case of online transfers I can detect the error and retry at low cost (or big cost - bandwidth is not free in many parts of the world). Going back to storing many streams (initial + increments) on tape - if an intermediate incremental stream has a single-bit error, then its snapshot and any which follow-up can not be received into zfs. Even if the "broken" block is later freed and discarded (equivalent to overwriting with a newer version of a file from a newer increment in classic backup systems with a file being the unit of backup). And since the total size of initial+incremental backups is likely larger than of a single full dump, the chance of a single corruption making your (latest) backup useless would be also higher, right? Thanks for clarifications, //Jim Klimov
On Fri, June 10, 2011 07:47, Edward Ned Harvey wrote:> #1 A single bit error causes checksum mismatch and then the whole data > stream is not receivable.I wonder if it would be worth adding a (toggleable?) forward error correction (FEC) [1] scheme to the ''zfs send'' stream. Even if we''re talking about a straight zfs send/recv pipe, and not saving to a file, it''d be handy as you wouldn''t have restart a large transfer for a single bit error (especially for those long initial syncs of remote ''mirrors''). [1] http://en.wikipedia.org/wiki/Forward_error_correction
> If it is true that unlike ZFS itself, the replication > stream format has > no redundancy (even of ECC/CRC sort), how can it be > used for > long-term retention "on tape"?It can''t. I don''t think it has been documented anywhere, but I believe that it has been well understood that if you don''t trust your storage (tape, disk, floppies, punched cards, whatever), then you shouldn''t trust your incremental streams on that storage. It''s as if the ZFS design assumed that all incremental streams would be either perfect or retryable. This is a huge problem for tape retention, not so much for disk retention. On a personal level I have handled this with a separate pool of fewer, larger and slower drives which serves solely as backup, taking incremental streams from the main pool every 20 minutes or so. Unfortunately that approach breaks the legacy backup strategy of pretty much every company. I think the message is that unless you can ensure the integrity of the stream, either backups should go to another pool or zfs send/receive should not be a critical part of the backup strategy. -- This message posted from opensolaris.org
2011-06-10 20:58, Marty Scholes ?????:>> If it is true that unlike ZFS itself, the replication >> stream format has >> no redundancy (even of ECC/CRC sort), how can it be >> used for >> long-term retention "on tape"? > It can''t. I don''t think it has been documented anywhere, but I believe that it has been well understood that if you don''t trust your storage (tape, disk, floppies, punched cards, whatever), then you shouldn''t trust your incremental streams on that storage.Well, the whole point of this redundancy in ZFS is about not trusting any storage (maybe including RAM at some time - but so far it is requested to be ECC RAM) ;) Hell, we don''t ultimately trust any storage... Oops, I forgot what I wanted to say next ;)> It''s as if the ZFS design assumed that all incremental streams would be either perfect or retryable. >Yup. Seems like another ivory-tower assumption ;)> This is a huge problem for tape retention, not so much for disk retention.Because why? You can make mirrors or raidz of disks?> On a personal level I have handled this with a separate pool of fewer, larger and slower drives which serves solely as backup, taking incremental streams from the main pool every 20 minutes or so. > > Unfortunately that approach breaks the legacy backup strategy of pretty much every company.I''m afraid it also breaks backups of petabyte-sized arrays where it is impractical to double or triple the number of racks with spinning drives, but is practical to have a closet full of tapes for the automated robot-feeded ;)> I think the message is that unless you can ensure the integrity of the stream, either backups should go to another pool or zfs send/receive should not be a critical part of the backup strategy.Or that zfs streams can be improved to VALIDLY become part of such strategy. Regarding the checksums in ZFS, as of now I guess we can send the ZFS streams to a file, compress this file with ZIP, RAR or some other format with CRC and some added "recoverability" (i.e. WinRAR claims to be able to repair about 1% of erroneous file data with standard settings) and send these ZIP/RAR archives to the tape. Obviously, a standard integrated solution within ZFS would be better and more portable. See FEC suggestion from another poster ;) //Jim
On Fri, Jun 10, 2011 at 8:59 AM, Jim Klimov <jimklimov at cos.ru> wrote:> Is such "tape" storage only intended for reliable media such as > another ZFS or triple-redundancy tape archive with fancy robotics? > How would it cope with BER in transfers to/from such media?Large and small businesses have been using TAPE as a BACKUP media for decades. One of the cardinal rules is that you MUST have at least TWO FULL copies if you expect to be able to use them. An Incremental backup is marginally better than an incremental zfs send in that you _can_ recover the files contained in the backup image. I understand why a zfs send is what it is (and you can''t pull individual files out of it), and that it must be bit for bit correct, and that IF it is large, then the chances of a bit error are higher. But given all that, I still have not heard a good reason NOT to keep zfs send stream images around as insurance. Yes, they must not be corrupt (that is true for ANY backup storage), and if they do get corrupted you cannot (without tweeks that may jeopardize the data integrity) "restore" that stream image. But this really is not a higher bar than for any other "backup" system. This is why I wondered at the original posters comment that he had made a critical mistake (unless the mistake was using storage for the image that had a high chance of corruption and did not have a second copy of the image). Sorry if that has been discussed here before, how much of this list I get to read depends on how busy I am. Right now I am very busy moving 20 TB of data from one configuration of 14 zpools to a configuration of one zpool (and only one dataset, no zfs send / recv for me), so I have lots of time to wait, and I spend some of that time reading this list :-) P.S. This data is "backed up", both the old and new configuration via regular zfs snapshots (for day to day needs) and zfs send / recv replication to a remote site (for DR needs). The initial zfs full send occurred when the new zpool was new and empty, so I only have to push the incrementals through the WAN link. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
> I stored a snapshot stream to a fileThe tragic irony here is that the file was stored on a non-zfs filesystem. You had had undetected bitrot which unknowingly corrupted the stream. Other files also might have been silently corrupted as well. You may have just made one of the strongest cases yet for zfs and its assurances. -- This message posted from opensolaris.org
On Jun 10, 2011, at 8:59 AM, David Magda wrote:> On Fri, June 10, 2011 07:47, Edward Ned Harvey wrote: > >> #1 A single bit error causes checksum mismatch and then the whole data >> stream is not receivable. > > I wonder if it would be worth adding a (toggleable?) forward error > correction (FEC) [1] scheme to the ''zfs send'' stream.pipes are your friend! -- richard
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jim Klimov > > See FEC suggestion from another poster ;)Well, of course, all storage mediums have built-in hardware FEC. At least disk & tape for sure. But naturally you can''t always trust it blindly... If you simply want to layer on some more FEC, there must be some standard generic FEC utilities out there, right? zfs send | fec > /dev/... Of course this will inflate the size of the data stream somewhat, but improves the reliability... But finally - If you think of a disk as one large sequential storage device, and a zfs send stream is just another large sequential data stream... And we take it for granted that a single bit error inside a ZFS filesystem doesn''t corrupt the whole filesystem but just a localized file or object... It''s all because the filesystem is broken down into a bunch of smaller blocks and each individual block has its own checksum. Shouldn''t it be relatively trivial for the zfs send datastream to do its checksums on smaller chunks of data instead of just a single checksum for the whole set?
On Jun 11, 2011, at 08:46, Edward Ned Harvey wrote:> If you simply want to layer on some more FEC, there must be some standard generic FEC utilities out there, right? > zfs send | fec > /dev/... > Of course this will inflate the size of the data stream somewhat, but improves the reliability...If one is saving streams to a disk, it pay be worth creating parity files for them (especially if the destination file system is not ZFS): http://en.wikipedia.org/wiki/Parity_file http://en.wikipedia.org/wiki/Parchive
> From: David Magda [mailto:dmagda at ee.ryerson.ca] > Sent: Saturday, June 11, 2011 9:04 AM > > If one is saving streams to a disk, it pay be worth creating parity filesfor them> (especially if the destination file system is not ZFS):Parity is just a really simple form of error detection. It''s not very useful for error correction. If you look into error correction codes, you''ll see there are many other codes which would be more useful for the purposes of zfs send datastream integrity on long-term storage.
On Jun 11, 2011, at 09:20, Edward Ned Harvey wrote:> Parity is just a really simple form of error detection. It''s not very > useful for error correction. If you look into error correction codes, > you''ll see there are many other codes which would be more useful for the > purposes of zfs send datastream integrity on long-term storage.> These parity files use a forward error correction-style system that can be used to perform data verification, and allow recovery when data is lost or corrupted.http://en.wikipedia.org/wiki/Parchive> Because this new approach doesn''t benefit from like sized files, it drastically extends the potiental applications of PAR. Files such as video, music, and other data can remain in a usable format and still have recovery data associated with them. > > The technology is based on a ''Reed-Solomon Code'' implementation that allows for recovery of any ''X'' real data-blocks for ''X'' parity data-blocks present.http://parchive.sourceforge.net/
2011-06-11 17:20, Edward Ned Harvey ?????:>> From: David Magda [mailto:dmagda at ee.ryerson.ca] >> Sent: Saturday, June 11, 2011 9:04 AM >> >> If one is saving streams to a disk, it pay be worth creating parity files > for them >> (especially if the destination file system is not ZFS): > Parity is just a really simple form of error detection. It''s not very > useful for error correction. If you look into error correction codes, > you''ll see there are many other codes which would be more useful for the > purposes of zfs send datastream integrity on long-term storage.Well, parity lets you reconstruct the original data, if you can decide which pieces to trust, no? Either there are many fitting pieces and few misfitting pieces (raidzN), or you have checksums so you know which copy is correct, if any. Or like some RAID implementations, you just trust the copy on a device which has not shown any errors (yet) ;) But, wait... if you have checksums to trust one of the two copies - it''s easier to make an option to embed mirroring into the "zfs send" stream? For example, interlace chunks of same data with checksums sized, say, 64Mb by default (not too heavy on cache, but big enough to unlikely be broken by the same external problem like a surface scratch; further configurable by sending user). Sample layout: AaA''a''BbB''b''... Where "A" and "A''" are copies of the same data, and "a" and "a''" are their checksums, "B"''s are the next set of chunks, etc. PS: Do I understand correctly that inside a ZFS send stream there are no longer original variably sized blocks from the sending system, so the receiver can reconstruct blocks on its disk according to dedup, compression, and maybe larger coalesced block sizes for files originally written in small portions, etc? If coalescing-on-write is indeed done, how does it play well with snapshots? I.e. if original file was represented in snapshot#1 by a few small blocks, received in snapshot#1'' as one big block, but later part of the file was changed. The source snapshot#2 includes only the changed small blocks, what would the receiving snapshot#2'' do? Or there is no coalescing and this is why? ;) Thanks, //Jim Klimov
> From: David Magda [mailto:dmagda at ee.ryerson.ca] > Sent: Saturday, June 11, 2011 9:38 AM > > These parity files use a forward error correction-style system that can be > used to perform data verification, and allow recovery when data is lost or > corrupted. > > http://en.wikipedia.org/wiki/ParchiveWell spotted. But par2 seems to be intended exclusively for use on files, not data streams. From a file (or files) create some par2 files... Anyone know of a utility that allows you to layer fec code into a data stream, suitable for piping?
On Jun 11, 2011, at 10:37, Edward Ned Harvey wrote:>> From: David Magda [mailto:dmagda at ee.ryerson.ca] >> Sent: Saturday, June 11, 2011 9:38 AM >> >> These parity files use a forward error correction-style system that can be >> used to perform data verification, and allow recovery when data is lost or >> corrupted. >> >> http://en.wikipedia.org/wiki/Parchive > > Well spotted. But par2 seems to be intended exclusively for use on files, > not data streams. From a file (or files) create some par2 files... > > Anyone know of a utility that allows you to layer fec code into a data > stream, suitable for piping?Yes; I was thinking more of the stream-on-disk use case. A FEC pipe might be a nice undregrad software project for someone. Perhaps by default multiplex the data and the FEC, and then on the other end do one of two things: de-multiplex things into the next part of the pipe, or split the FEC stream into the original to one file and the original data into another.
On Jun 11, 2011, at 5:46 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Jim Klimov >> >> See FEC suggestion from another poster ;) > > Well, of course, all storage mediums have built-in hardware FEC. At least disk & tape for sure. But naturally you can''t always trust it blindly... > > If you simply want to layer on some more FEC, there must be some standard generic FEC utilities out there, right? > zfs send | fec > /dev/... > Of course this will inflate the size of the data stream somewhat, but improves the reliability...The problem is that many FEC algorithms are good at correcting a few bits. For example, disk drives tend to correct somewhere on the order of 8 bytes per block. Tapes can correct more bytes per block. I''ve collected a large number of error reports showing the bitwise analysis of data corruption we''ve seen in ZFS and there is only one case where a stuck bit was detected. Most of the corruptions I see are multiple bytes and many are zero-filled. In other words, if you are expecting to use FEC and FEC only corrects a few bits, you might be disappointed. -- richard