Charles Cazabon
2013-Oct-01 21:12 UTC
Is `btrfsck --repair` supposed to actually repair problems?
Greetings, I''ve been using btrfs for bulk-storage purposes for a couple of years now (on vanilla linux-stable kernels on a few machines). I recently set up a new filesystem and have been copying data to it, when I had an unrelated kernel lockup. As expected, after rebooting btrfsck reported some checksum verify errors like: checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541 checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541 checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E There''s a few dozen of these. Running btrfsck with the --repair option, however, does not appear to fix these problems. I''ll attach the complete output of running with the --repair option; running btrfsck in check-only mode afterwards reports largely the same checksum errors as it did originally, prior to "repair". Shouldn''t `btrfsck --repair` actually repair these errors? Am I doing something wrong? System details: -current kernel is linux-stable 3.9.11 x86_64 -btrfs-progs built from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, which doesn''t appear to have changed in a long time -filesystem is 16.4TiB btrfs on LVM on md_crypt on an mdadm RAID-6 array. I know this is perhaps an odd setup, but btrfs didn''t support RAID-6 when I started using it. Any advice appreciated. Thanks, Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL''ed software available at: http://pyropus.ca/software/ -----------------------------------------------------------------------
Chris Murphy
2013-Oct-01 22:01 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
On Oct 1, 2013, at 3:12 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:> Greetings, > > I''ve been using btrfs for bulk-storage purposes for a couple of years now (on > vanilla linux-stable kernels on a few machines). I recently set up a new > filesystem and have been copying data to it, when I had an unrelated kernel > lockup. As expected, after rebooting btrfsck reported some checksum verify > errors like: > > checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541 > checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541 > checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E > > There''s a few dozen of these. > > Running btrfsck with the --repair option, however, does not appear to fix > these problems. I''ll attach the complete output of running with the --repair > option; running btrfsck in check-only mode afterwards reports largely the same > checksum errors as it did originally, prior to "repair". > > Shouldn''t `btrfsck --repair` actually repair these errors? Am I doing > something wrong?It looks like the file system thinks the file has changed and isn''t matching checksum. That''s not obviously fixable unless both data and metadata are raid1. More information is needed: btrfs fi df <mountpoint> btrfs show dmesg | grep -i btrfs dmesg | grep ata<port#> I''m assuming it''s a SATA drive, and if so you can get the port number with the last command and no port number, and figure out what port the drive is on. For me I get a line: [ 1.388091] ata1.00: ATA-8: WDC WD5000BEVT-22ZAT0, 01.01A01, max UDMA/133 So I''d use dmesg |grep ata1 Do that for all drives in the btrfs volume. And report the version of btrfs-progs. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Charles Cazabon
2013-Oct-01 23:46 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
Hi, Chris, Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 3:12 PM, Charles Cazabon > <charlesc-lists-btrfs@pyropus.ca> wrote: > > > Running btrfsck with the --repair option, however, does not appear to fix > > these [checksum verify] problems. I''ll attach the complete output of > > running with the --repair option; running btrfsck in check-only mode > > afterwards reports largely the same checksum errors as it did originally, > > prior to "repair". something wrong? > > It looks like the file system thinks the file has changed and isn''t matching > checksum. That''s not obviously fixable unless both data and metadata are > raid1.iPerhaps this wasn''t clear from my original message, but I''m not using btrfs'' RAID or lvm-like capabilities. The filesystem is on an LVM logical volume, with the actual underlying storage being an 8-disk RAID-6 array (mdadm array). So the stack is: vanilla btrfs filesystem (not using subvolumes, btrfs'' multiple device support or any other advanced features) LVM logical volume LVM volume group LVM physical volume md_crypt / LUKS encrypted volume mdadm RAID-6 array 8 x SATA disks> More information is needed:Okay: # btrfs fi df /media/bigbackup/ Data: total=4.53TB, used=4.22TB System, DUP: total=8.00MB, used=508.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=18.00GB, used=17.13GB Metadata: total=8.00MB, used=0.00> btrfs showThis fails with `btrfs: unknown token ''show''`.> dmesg | grep -i btrfsAfter mounting the filesystem read-only, the following ends up in the syslog: [13333.117462] Btrfs loaded [13333.157078] device label bigbackup devid 1 transid 5249 /dev/mapper/extbackup-bigbackup [13333.158445] btrfs: disk space caching is enabled That''s the only btrfs-related info that gets logged.> dmesg | grep ata<port#> > > I''m assuming it''s a SATA drive,As I say, it''s 8 disks (yes, SATA). What info exactly do you want about the disks and ports? The log is quite noisy because these are behind SATA port multipliers, and there are a bunch of other SATA drives in the system. But if I filter out all the extra stuff, then when I power up the port-multiplier boxes that the disks are in, what''s logged is 126 lines (much of it garbage from not all possible multiplier ports being in use), log attached. The 8 disks are, as you can see, all identical Seagate units: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133> And report the version of btrfs-progs.Btrfs v0.20-rc1-358-g194aa4a-dirty That''s what I get when I build from the git repository at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git git insists I''m fully up to date, though the last time I pulled before today was over a month ago. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL''ed software available at: http://pyropus.ca/software/ -----------------------------------------------------------------------
Chris Murphy
2013-Oct-02 00:42 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
On Oct 1, 2013, at 5:46 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:> > # btrfs fi df /media/bigbackup/ > Data: total=4.53TB, used=4.22TB > System, DUP: total=8.00MB, used=508.00KB > System: total=4.00MB, used=0.00 > Metadata, DUP: total=18.00GB, used=17.13GB > Metadata: total=8.00MB, used=0.00Since there''s only one copy of the data, there isn''t a way to repair it, it just notes that there is a checksum mismatch.> >> btrfs show > > This fails with `btrfs: unknown token ''show''`.I meant ''btrfs fi show''> As I say, it''s 8 disks (yes, SATA). What info exactly do you want about the > disks and ports?Looking for problems that relate to this one. When was the last time you did a scrub on the md device? And what was the result? What is the ''smartctl -l scterc /dev/sdX'' result for one of the drives? This sounds to me like it could be a bit flip, and btrfs is catching it but doesn''t have a 2nd copy of the data. Just a guess. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Charles Cazabon
2013-Oct-02 03:13 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 5:46 PM, Charles Cazabon wrote: > > > > # btrfs fi df /media/bigbackup/ > > Data: total=4.53TB, used=4.22TB > > System, DUP: total=8.00MB, used=508.00KB > > System: total=4.00MB, used=0.00 > > Metadata, DUP: total=18.00GB, used=17.13GB > > Metadata: total=8.00MB, used=0.00 > > Since there''s only one copy of the data, there isn''t a way to repair it, it > just notes that there is a checksum mismatch.Ah, I''m not looking to repair the files -- I can recopy the files easily enough, and rsync will pick up any files whose contents have been corrupted. I''d like to get the filesystem fixed, though. i.e., even deleting the affected files would be fine. This is a new filesystem to replace my existing (full) backups filesystem. The existing backups one is ext4 but this new one is too big for mkfs.ext4 to handle, so btrfs it is. I wasn''t expecting problems as I''ve been running btrfs for other purposes for years. Am I misunderstanding something here? It seems to me like btrfsck is telling me there''s problems with the filesystem itself when it continues to report these checksum errors even after a `btrfsck --repair`.> I meant ''btrfs fi show''Label: ''bigbackup'' uuid: c18dfd04-d931-4269-b999-e94df3b1918c Total devices 1 FS bytes used 4.23TB devid 1 size 16.37TB used 4.56TB path /dev/dm-9> > As I say, it''s 8 disks (yes, SATA). What info exactly do you want about > > the disks and ports? > > Looking for problems that relate to this one. > > When was the last time you did a scrub on the md device? And what was the > result?It''s a brand new array. The initial sync is actually still going on (about half complete; it''ll take several days to initialize an array this size on this hardware). So in short, the underlying array is clean.> What is the ''smartctl -l scterc /dev/sdX'' result for one of the drives?Warning: device does not support SCT Error Recovery Control command> This sounds to me like it could be a bit flip, and btrfs is catching it but > doesn''t have a 2nd copy of the data. Just a guess.If one of the disks flipped a bit, it would be caught at the md RAID-6 level, no? Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL''ed software available at: http://pyropus.ca/software/ ----------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Oct-02 03:50 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
On Oct 1, 2013, at 9:13 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:> > Ah, I''m not looking to repair the files -- I can recopy the files easily > enough, and rsync will pick up any files whose contents have been corrupted. > I''d like to get the filesystem fixed, though. i.e., even deleting the > affected files would be fine.If you run a scrub, dmesg should contain the path for affected files which you can then delete. If it''s just a checksum problem with files, the file system doesn''t need fixing. I''d wait until the raid is finished syncing.> This is a new filesystem to replace my existing > (full) backups filesystem. The existing backups one is ext4 but this new one > is too big for mkfs.ext4 to handle, so btrfs it is. I wasn''t expecting > problems as I''ve been running btrfs for other purposes for years.It''s still experimental. I''d expect almost anything.> > Am I misunderstanding something here? It seems to me like btrfsck is telling > me there''s problems with the filesystem itself when it continues to report > these checksum errors even after a `btrfsck --repair`.Well I haven''t seen the entire btrfsck or the entire dmesg so like I said I''m sorta guessing it''s just a file problem, but maybe you''ve stumbled on something else.> > It''s a brand new array. The initial sync is actually still going on (about > half complete; it''ll take several days to initialize an array this size on > this hardware).OK maybe someone else can comment if this is expected to work, maybe on linux-raid even. But now you tell us this? You didn''t think it might be important to mention that you''ve got a raid initially syncing, that you''ve formatted btrfs, copied files over, and at some point you got a kerne lock up, and then once restarted you ran a btrfsck? I would expect problems with any file system, with a system that locks up while the raid is still syncing.> So in short, the underlying array is clean.Well except you''ve got either file system corruption, or corrupt files.> >> What is the ''smartctl -l scterc /dev/sdX'' result for one of the drives? > > Warning: device does not support SCT Error Recovery Control commandThese drives aren''t well suited for RAID of any kind. Hopefully, at least, you will change the scsi layer time out for each drive using echo 121 >/sys/block/sdX/device/timeout That may not even be long enough, but without more information about what the ERC timeout of the drive is, which the manufacturer might have in the exhaustive version of their spec book, it''s a guess. Consumer drives try to recover for up to a couple minutes. If the scsi layer resets in 30 seconds (the default) then sector problems are never fixed because the drive never reports the read error back to the kernel. And md won''t write over the bad sector with reconstructed data. So you get an accumulation of bad sectors, rather than them being taken care of normally. Your application layer might get frustrated, or worse, with up to 2 minute delays in the storage stack.> >> This sounds to me like it could be a bit flip, and btrfs is catching it but >> doesn''t have a 2nd copy of the data. Just a guess. > > If one of the disks flipped a bit, it would be caught at the md RAID-6 level, > no?No. In normal operation the parity is never consulted, so it would have no idea if there''s a flipped bit. The hardware ought to catch it, but we know that isn''t always true. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Charles Cazabon
2013-Oct-02 16:53 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 9:13 PM, Charles Cazabon wrote: > > > > Ah, I''m not looking to repair the files -- I can recopy the files easily > > enough, and rsync will pick up any files whose contents have been corrupted. > > If you run a scrub, dmesg should contain the path for affected files which > you can then delete. If it''s just a checksum problem with files, the file > system doesn''t need fixing.Okay, I''ll do that.> I''d wait until the raid is finished syncing.Strictly speaking, this shouldn''t be necessary. mdadm arrays are fully usable from creation during the initial sync; the system tracks which bits have been initialized and which haven''t.> > It''s a brand new array. The initial sync is actually still going on > > (about half complete; it''ll take several days to initialize an array this > > size on this hardware). > > OK maybe someone else can comment if this is expected to work, maybe on > linux-raid even.https://raid.wiki.kernel.org/index.php/Initial_Array_Creation talks about the initial (re)sync. It explicitly states: This can take quite a time and the array is not fully resilient whilst this is happening (it is however fully useable).> But now you tell us this? You didn''t think it might be important to mention > that you''ve got a raid initially syncing, that you''ve formatted btrfs, > copied files over, and at some point you got a kerne lock up, and then once > restarted you ran a btrfsck?Yes. The array uses a write-intent bitmap, so the kernel lockup during the initial sync does not cause corruption; when the system is brought back up, it may re-initialize a portion that it had already initialized (i.e. it''s not 100% efficient), but it doesn''t result in corruption.> I would expect problems with any file system, with a system that locks up > while the raid is still syncing.No, this doesn''t cause any particular problems. It''s just like the normal case of a single-drive filesystem and the system crashing during a write. You just fsck to address any problems the interrupted write caused and recover the journal (if applicable). Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL''ed software available at: http://pyropus.ca/software/ ----------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Oct-02 19:13 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
On Oct 2, 2013, at 10:53 AM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:>> I''d wait until the raid is finished syncing. > > Strictly speaking, this shouldn''t be necessary. mdadm arrays are fully usable > from creation during the initial sync; the system tracks which bits have been > initialized and which haven''t.I know but it''s a 16TB array, do you really want to start over from scratch? No. And neither do most people. So this isn''t a use case that''s probably getting a ton of testing.>> But now you tell us this? You didn''t think it might be important to mention >> that you''ve got a raid initially syncing, that you''ve formatted btrfs, >> copied files over, and at some point you got a kerne lock up, and then once >> restarted you ran a btrfsck? > > Yes. The array uses a write-intent bitmap, so the kernel lockup during the > initial sync does not cause corruption; when the system is brought back up, it > may re-initialize a portion that it had already initialized (i.e. it''s not > 100% efficient), but it doesn''t result in corruption.OK except there is corruption. We just don''t know for sure if it''s just files or if it''s the file system. If you don''t know already what caused it, it''s not really correct to say what doesn''t result in corruption. Also the write-intent bitmap isn''t configured by default, and you didn''t previous say that it was. Is this an internal or external bitmap?>> I would expect problems with any file system, with a system that locks up >> while the raid is still syncing. > > No, this doesn''t cause any particular problems. It''s just like the normal > case of a single-drive filesystem and the system crashing during a write. > You just fsck to address any problems the interrupted write caused and recover > the journal (if applicable).If only hardware worked exactly per spec, and also didn''t lie about committing data to disk rather than merely keeping it in cache, this may be true. But hardware lies, it has bugs. And the kernel isn''t bug free either. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Charles Cazabon
2013-Oct-02 19:56 UTC
Re: Is `btrfsck --repair` supposed to actually repair problems?
Chris Murphy <lists@colorremedies.com> wrote:> On Oct 2, 2013, at 10:53 AM, Charles Cazabon wrote: > > >> I''d wait until the raid is finished syncing. > > > > Strictly speaking, this shouldn''t be necessary. > > I know but it''s a 16TB array, do you really want to start over from scratch? > No. And neither do most people. So this isn''t a use case that''s probably > getting a ton of testing.Fair enough. The sync should be done late today or early tomorrow, and I am waiting for it to complete before continuing to debug this. I''ll start with the scrub you mentioned.> Also the write-intent bitmap isn''t configured by default, and you didn''t > previous say that it was. Is this an internal or external bitmap?Internal. Thanks for your assistance to date. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL''ed software available at: http://pyropus.ca/software/ ----------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html