I''m having a problem with a newly extended btrfs volume. It is running on debian testing with an almost stock 3.1.0 kernel with a little bit of patches to get it compiling for my LaCie Network Space v2 ''Classic'' (http://lacie.nas-central.org/wiki/Category:Network_Space_2). It''s been working perfectly for about a year and a half with just the internal sda2 and I''ve had luck with extending disks before so I added a 3TB usb drive. My dad wanted to backup his computer so I gave him an SMB share and he copied everything using Windows Vista''s internal copy to it. The copy checked out by comparing sized and file/folder counts, but we didn''t verify anything else. He is now needing to restore the backup and he is getting zero sized files. I checked the some of the same files on the NAS itself and I get:> ERROR: cannot read `<filename>'' (Input/output error)I checked my kernel logs to see what the deal was and I didn''t see anything that makes sense so I ran a scrub: sudo /sbin/btrfs scrub status .> scrub status for 3a29a904-ad28-4e2a-8e80-df29d8d5fafc > scrub started at Thu May 31 10:15:39 2012 and finished after 27581 seconds > total bytes scrubbed: 1.00TB with 23929636 errors > error details: csum=23929636 > corrected errors: 0, uncorrectable errors: 23929636, unverified errors: 0These are the most frequent lines in the kern.log:> 1189 <datetime> <hostname> kernel: scrub_fixup: <number> callbacks suppressed > 1442 <datetime> <hostname> kernel: btrfs_readpage_end_io_hook: <number> callbacks suppressed > 11900 <datetime> <hostname> kernel: btrfs: unable to fixup at <number> > 14538 <datetime> <hostname> kernel: btrfs csum failed ino <number> off <number> csum <number> private <number> > 237549 <datetime> <hostname> kernel: bio too big device sdb1 (<number> > <number>)Many are from the scrub that I ran after I was getting I/O errors, but all of the non-scrub related errors were happening before I ran a scrub. If you need more info, my kern.log run through 7z is 322kB, but I don''t know if that will help. I rebooted the device out of desperation to see if it would help and when it came back up the btrfs volume wouldn''t mount using the origional sda2, but mounted fine when I mounted the sdb1:> May 31 10:05:48 eeyore kernel: device fsid 3a29a904-ad28-4e2a-8e80-df29d8d5fafc devid 1 transid 132729 /dev/sda2 > May 31 10:05:48 eeyore kernel: btrfs: failed to read chunk tree on sda2 > May 31 10:05:48 eeyore kernel: btrfs: open_ctree failedTrying "mount /dev/sdb1 /var/lib/btrfs" worked:> May 31 10:11:14 eeyore kernel: device fsid 3a29a904-ad28-4e2a-8e80-df29d8d5fafc devid 2 transid 132729 /dev/sdb1Then a mount -a worked to get /home mounted again. So, the question: What went wrong? Is there any hope of getting my Dad''s data back? How should I proceed from here? Delete the volume and start from scratch? Is btrfs not compatible with the size of disk/volume on this architecture? Is the external disk broken? Should I use a different filesystem? Why were there no indications of error when copying? If worse comes to worse, how can I tell which files are bad? Can scrub go through and unlink the bad files? Thanks for such a great file system and thanks for any help you can give. -- Randall Mason randall@mason.ch Here is some more information that may help: fstab:> UUID=3a29a904-ad28-4e2a-8e80-df29d8d5fafc /var/lib/btrfs btrfs defaults 0 0 > UUID=3a29a904-ad28-4e2a-8e80-df29d8d5fafc /home btrfs defaults,subvol=current 0 0sudo /sbin/btrfs filesystem show /dev/sdb1> Label: none uuid: 3a29a904-ad28-4e2a-8e80-df29d8d5fafc > Total devices 2 FS bytes used 1.04TB > devid 2 size 2.73TB used 185.50GB path /dev/sdb1 > devid 1 size 922.19GB used 922.19GB path /dev/sda2 > Btrfs Btrfs v0.19dpkg -l | grep -i btrfs> ii btrfs-tools 0.19+20120328-1 Checksumming Copy on Write Filesystem utilities-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Randall, Du meintest am 01.06.12:> I''m having a problem with a newly extended btrfs volume. It is > running on debian testing with an almost stock 3.1.0 kernel with a > little bit of patchesYou should use a newer kernel, p.e. 3.3.7 or 3.4 Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Helmut Hullen <Hullen <at> t-online.de> writes:> Du meintest am 01.06.12: > > > I''m having a problem with a newly extended btrfs volume. It is > > running on debian testing with an almost stock 3.1.0 kernel with a > > little bit of patches > > You should use a newer kernel, p.e. 3.3.7 or 3.4Hello! I''ve just upgraded to 3.4.0 from git.kernel.org and I''m still running into problems. I checked the Problems FAQ and there doesn''t seem to be anything that matches my problem. Here''s a set of errors when I try to rsync to the volume in question: I run: rmason@<btrfsHost>:~$ sudo rsync -avH -e "ssh -4" --delete --progress --inplace root@<otherHost>.mason.ch:/home/randall/ /home/randall/ root@<otherHost>.mason.ch''s password: receiving incremental file list anime/[SFS] Break Blade [BD 1920x1080 x264 FLAC][BG]/[SFS] Break Blade - 02v2 [BD 1920x1080 x264 FLAC][BG].mkv 2165440 0% 1.96MB/s 0:28:24 rsync: write failed on "/home/randall/anime/[SFS] Break Blade [BD 1920x1080 x264 FLAC][BG]/[SFS] Break Blade - 02v2 [BD 1920x1080 x264 FLAC][BG].mkv": Input/output error (5) rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9] rsync: connection unexpectedly closed (11135 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9] I tail messages.log during the rsync command and this is what I get: Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2052096 csum 2566472073 private 2923703033 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2052096 csum 2566472073 private 2923703033 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2138112 csum 2566472073 private 214370805 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2408448 csum 2566472073 private 3284052016 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2674688 csum 2566472073 private 669053144 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 2940928 csum 2566472073 private 2266756296 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 3211264 csum 2566472073 private 3658566492 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 3477504 csum 2566472073 private 4196274894 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 3743744 csum 2566472073 private 2836083845 Jun 7 04:42:51 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 4014080 csum 2566472073 private 2057760174 Jun 7 04:42:56 <btrfsHost> kernel: btrfs_readpage_end_io_hook: 334 callbacks suppressed Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 93687808 csum 2566472073 private 3122689738 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 93954048 csum 2566472073 private 2432138128 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 94220288 csum 2566472073 private 176960849 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 94490624 csum 2566472073 private 3686755414 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 94756864 csum 2566472073 private 3413998024 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 95023104 csum 2566472073 private 1213584997 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 95293440 csum 2566472073 private 3260773985 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 95559680 csum 2566472073 private 859919222 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 95825920 csum 2566472073 private 330759519 Jun 7 04:42:56 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 96096256 csum 2566472073 private 2517217076 Jun 7 04:43:01 <btrfsHost> kernel: btrfs_readpage_end_io_hook: 503 callbacks suppressed Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 231006208 csum 2566472073 private 4003729605 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 231272448 csum 2566472073 private 3631855256 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 231542784 csum 2566472073 private 610953759 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 231809024 csum 2566472073 private 3113324635 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 232075264 csum 2566472073 private 3907299018 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 232345600 csum 2566472073 private 4261889602 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 232611840 csum 2566472073 private 2892910093 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 232878080 csum 2566472073 private 3857339386 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 233148416 csum 2566472073 private 3410895248 Jun 7 04:43:01 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 233414656 csum 2566472073 private 4090819852 Jun 7 04:43:06 <btrfsHost> kernel: btrfs_readpage_end_io_hook: 508 callbacks suppressed Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 369664000 csum 2566472073 private 1949454837 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 369930240 csum 2566472073 private 2867367336 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 370200576 csum 2566472073 private 299690505 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 370466816 csum 2566472073 private 2449128099 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 370733056 csum 2566472073 private 3831761403 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 371003392 csum 2566472073 private 594867053 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 371269632 csum 2566472073 private 2391764021 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 371535872 csum 2566472073 private 1250201429 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 371806208 csum 2566472073 private 3704878237 Jun 7 04:43:06 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 372072448 csum 2566472073 private 1588417531 Jun 7 04:43:11 <btrfsHost> kernel: btrfs_readpage_end_io_hook: 515 callbacks suppressed Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 510197760 csum 2566472073 private 1297006436 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 510464000 csum 2566472073 private 1281951163 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 510730240 csum 2566472073 private 1649630795 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 511000576 csum 2566472073 private 621478547 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 511266816 csum 2566472073 private 709499052 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 511533056 csum 2566472073 private 1772067950 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 511803392 csum 2566472073 private 4181868570 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 512069632 csum 2566472073 private 3607792064 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 512335872 csum 2566472073 private 3698346856 Jun 7 04:43:11 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 512606208 csum 2566472073 private 2503998955 Jun 7 04:43:16 <btrfsHost> kernel: btrfs_readpage_end_io_hook: 500 callbacks suppressed Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 646713344 csum 2566472073 private 37706649 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 646979584 csum 2566472073 private 3463060199 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 647249920 csum 2566472073 private 2139594835 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 647516160 csum 2566472073 private 369916444 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 647782400 csum 2566472073 private 1041024394 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 648052736 csum 2566472073 private 1131002593 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 648318976 csum 2566472073 private 3304466401 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 648585216 csum 2566472073 private 2262830394 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 648855552 csum 2566472073 private 93740949 Jun 7 04:43:16 <btrfsHost> kernel: btrfs csum failed ino 1258837 off 649121792 csum 2566472073 private 2413285993 And I make sure that that file is exactly the file that''s being transferred: sudo find /home/randall/ -inum 1258837 /home/randall/anime/[SFS] Break Blade [BD 1920x1080 x264 FLAC][BG]/[SFS] Break Blade - 02v2 [BD 1920x1080 x264 FLAC][BG].mkv Any more help would be appreciated. Why is this happening, and how can I get my data back? -Randall Mason -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Randall, Du meintest am 07.06.12: [...]> I''ve just upgraded to 3.4.0 from git.kernel.org and I''m still running > into problems. I checked the Problems FAQ and there doesn''t seem to > be anything that matches my problem.[...]> Any more help would be appreciated. Why is this happening, and how > can I get my data back?Why? I don''t know - sorry. Data back: take your last backup. (I know this way to get my data back ...) "btrfs" is still under heavy construction, and restoring damaged data is still a problematic work. You always should have a valid backup (better: some backups). Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Randall Mason posted on Thu, 07 Jun 2012 12:13:53 +0300 as excerpted:> Any more help would be appreciated. Why is this happening, and how can > I get my data back?Well, any data that''s important is by definition backed up (no matter the filesystem use), or by definition you didn''t consider it /that/ important, being willing to play games with losing it, in the first place. And btrfs is stil an experimental filesystem, with big warnings on both the kernel option enabling it and on the wiki (as well as on this list) that it''s not fit for anything but testing, so you can''t really consider anything on btrfs more than testing data that you''re willing to lose, with a primary copy as well as its normal backups on other than btrfs, so that if you lose your btrfs copy (which was only for testing anyway) no big deal because you have both the primary and (tested, a backup that''s not tested isn''t yet a complete backup) backup copies. So to get your data back, simply grab the primary or backup copy that you copied your btrfs testing data from. No big deal. If you didn''t have such copies, then by definition you didn''t consider your data valuable in the first place, so again, no big deal losing what could only have been testing data, used for testing the still experimental btrfs. But to answer your question about the checksum errors, that''s btrfs checksum failures, which under heavy load (as in an rsync) it can occasionally report (due to bugs in the still experimental btrfs) even when the data is actually fine. Try accessing a smaller chunk of data at a time, less files, or if it''s a single large file, use dd or the like to copy individual chunks of it to a non-experimental filesystem, say ext3/4, reiserfs (which I use and which Chris Mason worked on for years before btrfs), or xfs. Let the btrfs filesystem rest between accesses. FWIW, I was testing btrfs myself with the kernel 3.4 rcs, but decided it was still to experimental for me, so I''m back on reiserfs, which has a bad rep from its early days, but which has been impressively solid for me even when not fully reliable hardware was triggering hard lockups and thus hard reboots. But the technique mentioned above did allow me to access the data on the btrfs filesystems as I was copying it back to reiserfs (I had backups but the data had changed a bit while I was testing, so getting the data off of the testing/btrfs copies saved me from having to redo those changes). If you''re still getting checksum errors when accessing only a few megs at a time, then the data likely really is damaged, and the checksum errors are letting you know that. That''s what btrfs is ultimately designed to do, only it''s still experimental and doesn''t always work the way it''s designed to, at present. Also note that btrfs'' compression option puts a bit more CPU load on it. I was using compression and think that was part of my problem. I think I''d have had less issues had I not been using compression. But of course, that doesn''t help when trying to read data that''s already on btrfs in compressed form. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 01, 2012 at 11:53:18AM +0300, Randall Mason wrote:> sudo /sbin/btrfs scrub status . > > scrub status for 3a29a904-ad28-4e2a-8e80-df29d8d5fafc > > scrub started at Thu May 31 10:15:39 2012 and finished after 27581 seconds > > total bytes scrubbed: 1.00TB with 23929636 errors > > error details: csum=23929636 > > corrected errors: 0, uncorrectable errors: 23929636, unverified errors: 0[...]> > These are the most frequent lines in the kern.log: > > 1189 <datetime> <hostname> kernel: scrub_fixup: <number> callbacks suppressed > > 1442 <datetime> <hostname> kernel: btrfs_readpage_end_io_hook: <number> callbacks suppressed > > 11900 <datetime> <hostname> kernel: btrfs: unable to fixup at <number> > > 14538 <datetime> <hostname> kernel: btrfs csum failed ino <number> off <number> csum <number> private <number>> > 237549 <datetime> <hostname> kernel: bio too big device sdb1 (<number> > <number>)As was reported via IRC in the past, this message could appear when a disk is connected via USB and makes a device group with other disks in the filesystem connected normally. The bio size is not same for the disks and the write requests may get dropped. This explains the high number of csum failures, and I''m afraid the data are lost. I think that the issue could be more general, once I put an old IDE disk with a new 4k sector 2TB disk into the same filesystem. I don''t remember what happened next (power failure or forced reboot), but when I mounted the filesystem I was not able to remove the 2TB disk from the set and saw csum problems plus the ''bio too big'' message. It was quite some time ago and I think it was a 3.1 based kernel with the known bug when flushing data to multiple devices. I wasn''t able to reproduce the bug though.> So, the question: What went wrong? Is there any hope of getting my > Dad''s data back? How should I proceed from here? Delete the volume > and start from scratch? Is btrfs not compatible with the size of > disk/volume on this architecture? Is the external disk broken? > Should I use a different filesystem? Why were there no indications of > error when copying? If worse comes to worse, how can I tell which > files are bad? Can scrub go through and unlink the bad files?> > UUID=3a29a904-ad28-4e2a-8e80-df29d8d5fafc /var/lib/btrfs btrfs defaults 0 0 > > UUID=3a29a904-ad28-4e2a-8e80-df29d8d5fafc /home btrfs defaults,subvol=current 0 0 > > sudo /sbin/btrfs filesystem show /dev/sdb1 > > Label: none uuid: 3a29a904-ad28-4e2a-8e80-df29d8d5fafc > > Total devices 2 FS bytes used 1.04TB > > devid 2 size 2.73TB used 185.50GB path /dev/sdb1 > > devid 1 size 922.19GB used 922.19GB path /dev/sda2so it''s 3TB and 1TB disks, isn''t there some issue with >2TB disks? I think this could be a bug, so either adapt to different disk characteristics of the disk or forbid to add the device. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html