Jonathan Wheeler
2008-Aug-10 11:18 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
Hi Folks, I''m in the very unsettling position of fearing that I''ve lost all of my data via a zfs send/receive operation, despite ZFS''s legendary integrity. The error that I''m getting on restore is: receiving full stream of faith/home at 09-08-08 into Z/faith/home at 09-08-08 cannot receive: invalid stream (checksum mismatch) Background: I was running snv_91, and decided to upgrade to snv_95 converting to the much awaited zfs-root in the process. On snv_91, I was using zfs for /opt, /export/home, and a couple of other file systems under /export I expected that converting to zfs root would require completely formatting my disk, so I needed to backup all of my critical data to a remote host beforehand. My main file server is running snv_71, using an 8 disks raid-Z, with plenty of space available via nfs, so I directed a zfs send across nfs to it. So it was zfs -> nfs -> zfs (raid-z) I don''t remember the exact commands used, but I started off with a zfs snapshot -r, and then did a zfs send zfs at snapshot > /my/nfs/server/backup.zfs This sent each of the filesystems across and redirected them into the one, single "backup" file. I wasn''t all that confident that this was a wise move, as I didn''t know how I was going to get just one fs (rather than all) extracted again at a later time using zfs receive (I''m open to answers on that one still!). So, I decided to *also send just the snapshot of my home directory, which contains all of my vital information. A bit of extra piece of mind eh, 2 backups are better than one.... I then installed snv_95 from dvd, using zfs-root, destroying my previous zpool on the disk in the process. Here I am now, trying to restore my vital data that I backed up onto the nfs server, but it''s not working! # cat justhome.zfs | zfs receive -v Z/faith/home receiving full stream of faith/home at 09-08-08 into Z/faith/home at 09-08-08 cannot receive: invalid stream (checksum mismatch) I just don''t understand what''s going on here. I started off restoring across nfs to my desktop with the standard options. I''ve tried disabling checksumming on the parent zfs fs, to ensure that when it was restoring it wouldn''t be using checksumming. I still got the checksum mismatch error. Next I tried restoring the zfs backup internally within the nfs server, making it all local disk traffic, on the off chance that it was the network on my new build that was somehow broken. No dice, same error, with or without checksumming on the parent fs. I''ve also tried my other backup file, but that''s also having the same problem. In all I''ve tried about 8 combinations, and I''m breaking out in a sweat with the possibility of having lost all of my data. The zfs backup that included all file systems bombs out fairly early, on a small fs that was only a few GB. The zfs backup that included just my home fs, gets around 20Gb of the way through, before failing with the same error (and deleting the partial zfs fs). I don''t recall how big the original home fs was, perhaps 30-40GB, so it''s a fair way through. What''s causing this error, and if this situation is as dire as I''m fearing (please tell me it''s not so!), why can''t I at least have the 20GB of data that it can restore before it bombs out with that checksum error. Thanks for any help with this! Jonathan This message posted from opensolaris.org
Jonathan Wheeler
2008-Aug-12 10:39 UTC
[zfs-discuss] corrupt zfs stream? "checksum mismatch"
Hi folks, Perhaps I was a little verbose in my first post, putting a view people off. Does anyone else have any ideas on this one. I can''t be the first person to have had a problem with a zfs backup stream. Is there nothing that can be done to recover at least some of the stream. As another helpful chap pointed out, if tar encounters an error in the bitstream it just moves on until it finds usable data again. Can zfs not do something similar? I''ll take whatever i can get! Jonathan This message posted from opensolaris.org
Mattias Pantzare
2008-Aug-12 12:21 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
2008/8/10 Jonathan Wheeler <griffous at griffous.net>:> Hi Folks, > > I''m in the very unsettling position of fearing that I''ve lost all of my data via a zfs send/receive operation, despite ZFS''s legendary integrity. > > The error that I''m getting on restore is: > receiving full stream of faith/home at 09-08-08 into Z/faith/home at 09-08-08 > cannot receive: invalid stream (checksum mismatch) > > Background: > I was running snv_91, and decided to upgrade to snv_95 converting to the much awaited zfs-root in the process.You could try to restore on a snv_91 system. zfs send streams is not for backups. This is from the zfs man page: The format of the stream is evolving. No backwards com- patibility is guaranteed. You may not be able to receive your streams on future versions of ZFS. Or the file was corrupted when you transfered it.
>>>>> "mp" == Mattias Pantzare <pantzer at ludd.ltu.se> writes:mp> Or the file was corrupted when you transfered it. he stored the backup streams on ZFS, so obviously they couldn''t possibly be corrupt. :p Jonathan, does ''zfs receive -nv'' also detect the checksum error, or is it only detected when you actually receive onto a pool without -n? in addition to skipping to the next header of corrupted tarballs, tar can validate a tarball''s checksums without extracting it, so it''s possible to write a tape, then read it to see if it''s ok. The ''tar t'' read test checks for medium errors, driver bugs, and bugs inside tar itself. so it sounds like: brrk, brrk, danger, do not use zfs send/receive for backups---use only for moving filesystems from one pool to another. This brings back the question ``how is it possible to back up and restore a heavily-cloned/snapshotted system?'''' because upon restore the clone inheritance tree is lost, and you''ll never have enough space in the pool to fit what was there before. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080812/f349ef3e/attachment.bin>
Jonathan Wheeler
2008-Aug-13 13:06 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
Hi Mattias & Miles. To test the version mismatch theory, I setup a snv_91 VM (using virtualbox) on my snv_95 desktop, and tried the zfs receive again. Unfortunately the symptoms are exactly the same: around the ~20GB mark, the justhome.zfs stream still bombs out with the checksum error. I didn''t realise that the zfs stream format wasn''t backward compatible at the time that I made the backup, but having performed the above test, this doesn''t actually appear to be my problem. I wish it were - that I could have dealt with! :( So far we''ve established that in this case: *Version mismatches aren''t causing the problem. *Receiving across the network isn''t the issue (because I have the exact same issue restoring the stream directly on my file server). *All that''s left was the initial send, and since zfs guarantees end to end data integrity, it should have been able to deal with any network possible randomness in the middle (zfs on both ends) - or at absolute worst, the zfs send command should have failed, if it encountered errors. Seems fair, no? So, is there a major bug here, or at least an oversight in the zfs send part of the code? Does zfs send not do checksumming, or, verification after sending? I''m not sure how else to interpret this data. Today to add some more datapoints, I repeated a zfs send to the same nfs server from the same desktop, though this time I''m using zfs root with snv_95. Same hardware, same network, same commands, but this time I didn''t have any issues with the zfs receive. ?!?!?!?! Miles: zfs receive -nv works ok: # zfs receive -vn rpool/test < /net/supernova/Z/backup/angelous/justhome.zfs would receive full stream of faith/home at 09-08-08 into rpool/test at 09-08-08 Where it gets interesting is with my recursive zfs dump: bash-3.2# zfs receive -nvF -d rpool/test < /net/supernova/Z/backup/angelous/pre-zfsroot.zfs would receive full stream of faith at 09-08-08 into rpool/test at 09-08-08 would receive full stream of faith/virtualmachines at 09-08-08 into rpool/test/virtualmachines at 09-08-08 would receive full stream of faith/opt at 09-08-08 into rpool/test/opt at 09-08-08 would receive full stream of faith/home at 09-08-08 into rpool/test/home at 09-08-08 faith at 09-08-08 is actually empty. faith/virtualmachines at 09-08-08 bombs out around 2GB in, but I''m not really too worried about that fs. faith/opt at 09-08-08 is also another fs that I can live without. faith/home at 09-08-08 is the one that we''re after. It would seem that my justhome.zfs dump (containing only faith/home at 09-08-08) isn''t going to work, but is there some way to recover the /home fs from the pre-zfsroot.zfs dump? Since there seems to be a problem with the first fs (faith/virtualmachines), I need to find a way to skip restoring that zfs, so it can focus on the faith/home fs. How can this be achieved with zfs receive? Jonathan This message posted from opensolaris.org
Mattias Pantzare
2008-Aug-13 13:37 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
2008/8/13 Jonathan Wheeler <griffous at griffous.net>:> So far we''ve established that in this case: > *Version mismatches aren''t causing the problem. > *Receiving across the network isn''t the issue (because I have the exact same issue restoring the stream directly on > my file server). > *All that''s left was the initial send, and since zfs guarantees end to end data integrity, it should have been able to deal > with any network possible randomness in the middle (zfs on both ends) - or at absolute worst, the zfs send command > should have failed, if it encountered errors. Seems fair, no? > > So, is there a major bug here, or at least an oversight in the zfs send part of the code? > Does zfs send not do checksumming, or, verification after sending? I''m not sure how else to interpret this data.zfs send can''t do any verification after sending. It is sending to a pipe, it does not know that it is writing to a file. ZFS receive can verify the data, as you know. ZFS is not involved in moving the data over the network when you are using NFS. There are many places where data can get corrupt even when you are using ZFS. Non ECC memory is one example. There might be a bug in zfs but that is hard to check as you can''t reproduce the problem.
Mattias Pantzare wrote:> 2008/8/13 Jonathan Wheeler <griffous at griffous.net>: >> So far we''ve established that in this case: >> *Version mismatches aren''t causing the problem. >> *Receiving across the network isn''t the issue (because I have the exact same issue restoring the stream directly on >> my file server). >> *All that''s left was the initial send, and since zfs guarantees end to end data integrity, it should have been able to deal >> with any network possible randomness in the middle (zfs on both ends) - or at absolute worst, the zfs send command >> should have failed, if it encountered errors. Seems fair, no? >> >> So, is there a major bug here, or at least an oversight in the zfs send part of the code? >> Does zfs send not do checksumming, or, verification after sending? I''m not sure how else to interpret this data. > > zfs send can''t do any verification after sending. It is sending to a > pipe, it does not know that it is writing to a file. ZFS receive can > verify the data, as you know. > > ZFS is not involved in moving the data over the network when you are using NFS.ZFS is never involved in moving data over the network. It doesn''t know anything about networking. Even if you are using iSCSI or FCoE ZFS still doesn''t know about networking the "disk" layers do. For the ZFS send/recv cases as you said it just writes to stdout and reads from stdin. -- Darren J Moffat
>>>>> "jw" == Jonathan Wheeler <griffous at griffous.net> writes: >>>>> "mp" == Mattias Pantzare <pantzer at ludd.ltu.se> writes:jw> Miles: zfs receive -nv works ok one might argue ''zfs receive'' should validate checksums with the -n option, so you can check if a just-written dump is clean before counting on it. Without this, even with hindsight bias it''s really hard to blame the sysadmin instead of ZFS this time. jw> Since there seems to be a problem with the first fs jw> (faith/virtualmachines), I need to find a way to skip jw> restoring that zfs, so it can focus on the faith/home fs. right. you do not even need a fix for the supposed corruption, just for the pedantry. mp> There might be a bug in zfs but that is hard to check as you mp> can''t reproduce the problem. the ''zfs receive'' problem happens every time one tries to restore that file, and he still has the file, so it''s reproduceable in that sense. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080813/6d50a05d/attachment.bin>
Jonathan Wheeler
2008-Aug-13 14:10 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
Thanks for the information, I''m learning quite a lot from all this. It seems to me that zfs send *should* be doing some kind of verification, since some work has clearly been put into zfs so that zfs''s can be dumped into files/pipes. It''s a great feature to have, and I can''t believe that this was purely for zfs send | zfs receive scenarios. A common example used all over the place is zfs send | ssh $host. In these examples is ssh guaranteeing the data delivery somehow? If not, there need to be some serious asterisks in these guides! Looking at this at a level that I do understand, it''s going via TCP, which checksums packets..... then again, I was using nfs over TCP, and look where I am today. So much for that! As I google these subjects more and more, I fear that I''m hitting the conceptual mental block that many before me have done also. zfs send is not zfsdump, even though it sure looks the same, and it''s not clearly stated that you may end up in a situation like the one I''m in today if you don''t somehow test your backups. As you''ve rightly pointed out, it''s done now and even if I did manage to reproduce this again, that won''t help my data locked away in these 2 .zfs files, so focusing on the hopeful is there anything I can do to recover my data from these zfs dumps? Anything at all :) If the problem is "just" that "zfs receive" is checksumming the data on the way in, can I disable this somehow within zfs? Can I globally disable checksumming in the kernel module? mdb something or rather? I read this thread where someone did successfully manage to recovery data from a damaged zfs, which fulls me with some hope: http://www.opensolaris.org/jive/thread.jspa?messageID=220125 It''s way over my head, but if anyone can tell me the mdb commands I''m happy to try them, even if they do kill my cat. I don''t really have anything to loose with a copy of the data, and I''ll do it all in a VM anyway. Thanks, Jonathan This message posted from opensolaris.org
>>>>> "jw" == Jonathan Wheeler <griffous at griffous.net> writes:jw> A common example used all over the place is zfs send | ssh jw> $host. In these examples is ssh guaranteeing the data delivery jw> somehow? it is really all just appologetics. It sounds like a zfs bug to me. The only alternative is bad hardware (not disks), so you could try memory testers, continuous big ''make -n <big number, like 4 - 10>'' builds, scripted continuous zpool send/recv, to look for this. jw> you may end up in a situation like the one I''m in today if you jw> don''t somehow test your backups. which is why I asked you to check -n spots it. It doesn''t---the tool gives you no way to test the backups! I''ve lost before because I backed things up onto tape, wiped the original, and then had the tape go bad. The idea of backups is to always have two copies, so I should have written two tapes. but I don''t see any reason to believe you wouldn''t get two bad copies in your case since it sounds like a bug. I also made the mistake of using FancyTape---I used some DAT bullshit with a ``table of contents'''' that can become ``corrupt'''' if you power off the drive at the wrong moment, which simpler tape formats don''t have. DAT also has these block checksums, where some drives if they can''t read part of the tape, they just hang forever and can''t seek past it. (weirdly analagous to zfs receive). I had already learned not to gzip a tarball before writing it to tape if the tarball contained mostly uncompressable things, because the gzip format is less robust than the tar format. but, I got bitten anyway because of the stupid tape TOC and the poor exception handling in the DAT drive''s firmware. What''s required, *given hindsight*, is to realize that the purpose of backups for ZFS users is partly to protect ourselves from ZFS bugs, so the backups need to be stored in a format that has nothing to do with ZFS, like tar or UDF or a non-ZFS filesystem. however if you have lots of snapshots or clones, I''m not sure this is possible because the data expands too much. In that case I might store backups in an zpool rather than in a file, because I expect zpool corruption bugs will get more attention sooner than ''zfs send'' corruption bugs. but, that''s still sketchy, and had it not been for your experience, I might have trusted the zfs send format. ``learn'''', fine, but I don''t think you''ve done anything unreasonable. jw> is there anything I can do to recover my data from these zfs jw> dumps? Anything at all :) fix ''zfs receive'' to ignore the error? :) burry the dumps in the sand for two years, and hope someone else fixes ZFS in the mean time? :) That''s what I did to my tape with the bad TOC. no good news yet. jw> If the problem is "just" that "zfs receive" is checksumming jw> the data on the way in, can I disable this somehow within zfs? jw> Can I globally disable checksumming in the kernel module? mdb jw> something or rather? sounds plausible but I don''t know how, so please let me know if you find a way. I found also some magic /etc/system incantations, but it doesn''t seem to apply to ''zfs receive''. It''s more of what you found, more ``simon sez, import!'''' stuff: http://opensolaris.org/jive/message.jspa?messageID=192572#194209 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080813/d37b7804/attachment.bin>
Jonathan Wheeler wrote:> Thanks for the information, I''m learning quite a lot from all this. > > It seems to me that zfs send *should* be doing some kind of verification, since some work has clearly been put into zfs so that zfs''s can be dumped into files/pipes. It''s a great feature to have, and I can''t believe that this was purely for zfs send | zfs receive scenarios. >zfs send/receive is not a backup solution because it does not have the features generally expected in a backup solution. It is a very low-level method of replicating dataset structure. If you find documentation to the contrary, which was created after CR6399918 was integrated, then please file a new bug. http://bugs.opensolaris.org/view_bug.do?bug_id=6399918> A common example used all over the place is zfs send | ssh $host. In these examples is ssh guaranteeing the data delivery somehow? If not, there need to be some serious asterisks in these guides! >In this case, the receive does checks and will fail when the checks do not pass. In such cases, the send can be restarted. ssh performs encryption, and encryption codes tend to be more robust because a corruption will tend to fail upon decryption (including the surrounding checksum checks). If you save the contents of the pipe somewhere, then you are at the mercy of the robustness of the saved location. However, there is more that can be done here, both inside and outside of ZFS. For inside ZFS, I have filed an RFE: CR6736837, improve send/receive fault tolerance. However, to be effective, we really need a better understanding of the failures we expect to encounter. As an interim step, know that a send will create the same stream because it is sending a stable set of data. You can send to files twice, on diverse storage, and then compare the resulting files. In other words, the flexibility of UNIX pipes is exposed by zfs send/receive.> Looking at this at a level that I do understand, it''s going via TCP, which checksums packets..... then again, I was using nfs over TCP, and look where I am today. So much for that! >I do not think you will be able to identify the root cause of your corruption -- there are far too many dependents and you do not have a known-good reference :-(.> As I google these subjects more and more, I fear that I''m hitting the conceptual mental block that many before me have done also. zfs send is not zfsdump, even though it sure looks the same, and it''s not clearly stated that you may end up in a situation like the one I''m in today if you don''t somehow test your backups. >Correct, though this applies to everything, in general. One backup method I use (I use several ;-), is to use send/receive to a removable disk, usually a USB disk. I can then setup compression and redundancy policies for the USB disk and also periodically scrub to test the retention. This also offers the ability to go back to any snapshot in a matter of minutes, even though I store the USB disk in a fire safe. Another benefit to this method is that I can easily verify the media -- I was once a user of 8mm tape drives, so I''ve got several scars related to the inability to recover data from tapes (they had a nasty habit of writing tapes that couldn''t be read from other 8mm drives, so if you had to repair your drive (likely), then you might not be able to read your tapes).> As you''ve rightly pointed out, it''s done now and even if I did manage to reproduce this again, that won''t help my data locked away in these 2 .zfs files, so focusing on the hopeful is there anything I can do to recover my data from these zfs dumps? Anything at all :) >I filed RFE CR 6736794, option for partial zfs receives. But I''m not confident that it can be implemented easily or quickly.> If the problem is "just" that "zfs receive" is checksumming the data on the way in, can I disable this somehow within zfs? > Can I globally disable checksumming in the kernel module? mdb something or rather? > > I read this thread where someone did successfully manage to recovery data from a damaged zfs, which fulls me with some hope: > http://www.opensolaris.org/jive/thread.jspa?messageID=220125 > > It''s way over my head, but if anyone can tell me the mdb commands I''m happy to try them, even if they do kill my cat. I don''t really have anything to loose with a copy of the data, and I''ll do it all in a VM anyway. >With mdb and the source, all things are possible. But I''ll have to defer to someone who uses mdb more frequently than I. -- richard
There is an explicit check in ZFS for the checksum, as you deduced. I suspect that by disabling this check you could recover much, if not all, of your data. You could probably do this with mdb by ''simply'' writing a NOP over the branch in dmu_recv_stream. It appears that ''zfs send'' was designed to generate a stream which would immediately be consumed by ''zfs recv''. A simple checksum suffices, then, to detect problems in transmission (or certain classes of bugs on the sending side), since the operation can be retried on error. If the stream will be stored in any way, however, redundancy should be included in the stream (a la the VMS Backup utility). This message posted from opensolaris.org
Jonathan Wheeler
2008-Aug-15 13:32 UTC
[zfs-discuss] corrupt zfs stream? checksum mismatch
Hi Richard, Thanks for the detailed reply, and the work behind the scenes filing the CRs. I''ve bookmarked both, and will keep a keen eye on them for status changes. As Miles put it, I''ll have to put these dumps into storage for possible future use. I do dearly hope that I''ll be able to recover most of that data in the future, but for the most important bits (documents/spreadsheets), I''ll have to rebuild them by way of some rather intensive data entry based on hard copies, now. Not fun. I do have a working [zfs send dump!] backup from October, so it''s not a total loss of my livelihood, but it''ll be a life lesson alright. With CR 6736794, I wonder if some extra notes could be added around the checksumming side of the code? The wording that has been used doesn''t quite match my scenario, but I certainly agree with what requested functionality has been requested there. I have a 50GB zfs send dump and zfs receive is failing (and rolling back) around the 20GB mark. While the exact cause and nature of my issue remains unknown, I very much expect that the vast majority of my zfs send dump is in fact in tact, including data beyond that 20GB checksum error point. I.E there is a problem around the 20GB mark, but I expect that the remaining 30GB contains "good" data, or in very least, *mostly* good data. The CR appears to be only requesting that zfs receive stop at the 20GB mark, but {new feature} allows the failed restore attempt to be mountable, in a unknown/known bad state. I''d much prefer that zfs receive continue on error too, thus giving it the full 50GB to process and attempt to repair, rather than only the data up until the point that it encountered it''s first problem. Without knowing much about the actual on disk format,metadata and structures I can''t be sure, but the fs is going to have a much better chance at recovering when there is more data available across the entire length of the fs, right? I know from my linux days that the ext2/3 superblocks were distributed across the full disk, so the more of the disk that it can attempt to read, the better the chance that it''ll find more correct metadata to use in an attempt a repair of the FS. And of course the second benefit of reading more of the data stream, past an error is that more user data will at least have a chance of being recovered. If it stops half way, it has _no_ chance of recovering that data, so I favor my odds of letting it go on to at least try :) Or is that an entirely new CR itself? Jonathan This message posted from opensolaris.org
> Hi Folks, >> The error that I''m getting on restore is: > receiving full stream of faith/home at 09-08-08 into > Z/faith/home at 09-08-08 > cannot receive: invalid stream (checksum mismatch) >Did you find a work around? I have the same problem, except with a replication set. This is with b95, which performed the send to a file on an nfs mount. I''m also using b95 and/or b96 to receive the file. I get 14G or so read and then this error happens. Is is possible to turn off the checksum? Just so that I can recover what data is there? -- This message posted from opensolaris.org
Hey Richard, I''ve just seen that somebody else has been caught out by this. Do you think it would be worth adding an RFE to add ''send to file'' support to ZFS send? I''ll be using data piped to file myself, and while I''m not worried about corruption myself, if ZFS send knows it''s sending to a file, it could check the integrity of the file once the operation completes, which would probably have helped these guys. It might also be useful to output a text file containing the checksum so the integrity of the file can be verified at a later date. Ross -- This message posted from opensolaris.org
Ross wrote:> Hey Richard, > > I''ve just seen that somebody else has been caught out by this. Do you think it would be worth adding an RFE to add ''send to file'' support to ZFS send? >No. Pipes are a foundation of UNIX and are much more flexible than a fixed file interface (as shown below... :-)> I''ll be using data piped to file myself, and while I''m not worried about corruption myself, if ZFS send knows it''s sending to a file, it could check the integrity of the file once the operation completes, which would probably have helped these guys. >I filed CR 6736837, improve send/receive fault tolerance, which keeps the pipe structure intact. Feel free to pile on. http://bugs.opensolaris.org/view_bug.do?bug_id=6736837> It might also be useful to output a text file containing the checksum so the integrity of the file can be verified at a later date. >If you redirect to a file, you can use existing checksum commands. You could even check that against what flows out of the send. zfs send | tee filename | digest -a md5 > filename.md5 -- richard
Possibly Parallel Threads
- Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
- zfs list extentions related to pNFS
- incremental backup with zfs to file
- [install-discuss] lucreate into New ZFS pool
- Doing ZFS rollback with preserving later created clones/snapshot?