Hi, I''m using a Linux 3.7.6 (Gentoo Linux) with btrfs-progs-0.20_rc1_p56 and since few days, I have some uncorrectable errors : # btrfs scrub status / scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3 scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds total bytes scrubbed: 31.02GB with 6 errors error details: csum=6 corrected errors: 0, uncorrectable errors: 6, unverified errors: 0 I don''t know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk. I have discovered this problem thanks to several errors in dmesg when I try to access to a file : [ 2985.163718] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0 [ 2985.169191] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0 [ 2993.102810] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.114213] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.114527] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.114795] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.115097] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.115349] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.115585] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.115956] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.116260] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2993.116558] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2998.100230] csum_tree_block: 27408 callbacks suppressed [ 2998.100233] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2998.100406] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 [ 2998.100591] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0 If I restart a btrfs scrub, I get these messages : [ 3047.835131] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5 [ 3047.835134] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5 [ 3047.835137] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0 [ 3047.953751] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2 [ 3052.349518] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 1102728: metadata leaf (level 0) in tree 5 [ 3052.349521] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 1102728: metadata leaf (level 0) in tree 5 [ 3052.349524] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 21, gen 0 [ 3055.840357] btrfs: unable to fixup (regular) error at logical 556208128 on dev /dev/sda2 [ 3061.032879] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 2645232: metadata leaf (level 0) in tree 5 [ 3061.032882] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 2645232: metadata leaf (level 0) in tree 5 [ 3061.032885] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0 [ 3063.014553] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2 [ 3067.758444] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 3199880: metadata leaf (level 0) in tree 5 [ 3067.758447] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 3199880: metadata leaf (level 0) in tree 5 [ 3067.758450] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 23, gen 0 [ 3067.822206] btrfs: unable to fixup (regular) error at logical 556208128 on dev /dev/sda2 I tried a LiveCD to make a btrfsck [I have to check its version] but it segfaults during the test. Today, I can''t remove the file (and I can''t delete its directory), updatedb runs during hours when it tries to read this file. So, what is the best way to recover these errors (as I think that some files are definitely lost) ? I would like to identify the corrupted files and to delete them. Regards, Frederic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To complete my previous request, the log of btrfsck 0.20_rc1_p56 which segfaults : checking extents checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 Csum didn''t match checksum verify failed on 556208128 wanted D88A417C found 11 checksum verify failed on 556208128 wanted D88A417C found 11 checksum verify failed on 556208128 wanted D88A417C found 11 checksum verify failed on 556208128 wanted D88A417C found 11 Csum didn''t match owner ref check failed [272228352 4096] repair deleting extent record: key 272228352 168 4096 adding new tree backref on start 272228352 len 4096 parent 5 root 5 Backref 402792448 parent 2 root 2 not found in extent tree Backref 402792448 root 2 not referenced back 0xa55ce08 Incorrect global backref count on 402792448 found 1 wanted 0 backpointer mismatch on [402792448 4096] Backref 436817920 parent 2 root 2 not found in extent tree Backref 436817920 root 2 not referenced back 0xa4b9488 Incorrect global backref count on 436817920 found 1 wanted 0 backpointer mismatch on [436817920 4096] Backref 518414336 parent 2 root 2 not found in extent tree Backref 518414336 root 2 not referenced back 0xa55b0c8 Incorrect global backref count on 518414336 found 1 wanted 0 backpointer mismatch on [518414336 4096] Backref 540577792 parent 2 root 2 not found in extent tree Backref 540577792 root 2 not referenced back 0x8793588 Incorrect global backref count on 540577792 found 1 wanted 0 backpointer mismatch on [540577792 4096] owner ref check failed [556208128 4096] repair deleting extent record: key 556208128 168 4096 adding new tree backref on start 556208128 len 4096 parent 5 root 5 Backref 565420032 parent 2 root 2 not found in extent tree Backref 565420032 root 2 not referenced back 0xa437620 Incorrect global backref count on 565420032 found 1 wanted 0 backpointer mismatch on [565420032 4096] Backref 597049344 parent 2 root 2 not found in extent tree Backref 597049344 root 2 not referenced back 0xa414720 Incorrect global backref count on 597049344 found 1 wanted 0 backpointer mismatch on [597049344 4096] Backref 610033664 parent 2 root 2 not found in extent tree Backref 610033664 root 2 not referenced back 0xa46f250 Incorrect global backref count on 610033664 found 1 wanted 0 backpointer mismatch on [610033664 4096] Backref 636481536 parent 2 root 2 not found in extent tree Backref 636481536 root 2 not referenced back 0xa4d53c8 Incorrect global backref count on 636481536 found 1 wanted 0 backpointer mismatch on [636481536 4096] Backref 673796096 parent 2 root 2 not found in extent tree Backref 673796096 root 2 not referenced back 0xa474368 Incorrect global backref count on 673796096 found 1 wanted 0 backpointer mismatch on [673796096 4096] Backref 717684736 parent 2 root 2 not found in extent tree Backref 717684736 root 2 not referenced back 0x8793658 Incorrect global backref count on 717684736 found 1 wanted 0 backpointer mismatch on [717684736 4096] Backref 739885056 parent 2 root 2 not found in extent tree Backref 739885056 root 2 not referenced back 0xa501d20 Incorrect global backref count on 739885056 found 1 wanted 0 backpointer mismatch on [739885056 4096] Backref 745107456 parent 2 root 2 not found in extent tree Backref 745107456 root 2 not referenced back 0xa562260 Incorrect global backref count on 745107456 found 1 wanted 0 backpointer mismatch on [745107456 4096] Backref 770273280 parent 2 root 2 not found in extent tree Backref 770273280 root 2 not referenced back 0xa482138 Incorrect global backref count on 770273280 found 1 wanted 0 backpointer mismatch on [770273280 4096] Backref 771325952 parent 2 root 2 not found in extent tree Backref 771325952 root 2 not referenced back 0x8638260 Incorrect global backref count on 771325952 found 1 wanted 0 backpointer mismatch on [771325952 4096] Backref 775409664 parent 2 root 2 not found in extent tree Backref 775409664 root 2 not referenced back 0x81a3068 Incorrect global backref count on 775409664 found 1 wanted 0 backpointer mismatch on [775409664 4096] Backref 775598080 parent 2 root 2 not found in extent tree Backref 775598080 root 2 not referenced back 0x81a7540 Incorrect global backref count on 775598080 found 1 wanted 0 backpointer mismatch on [775598080 4096] Backref 775700480 parent 2 root 2 not found in extent tree Backref 775700480 root 2 not referenced back 0x8639100 Incorrect global backref count on 775700480 found 1 wanted 0 backpointer mismatch on [775700480 4096] Backref 775729152 parent 2 root 2 not found in extent tree Backref 775729152 root 2 not referenced back 0xa428248 Incorrect global backref count on 775729152 found 1 wanted 0 backpointer mismatch on [775729152 4096] Backref 775761920 parent 2 root 2 not found in extent tree Backref 775761920 root 2 not referenced back 0xa428318 Incorrect global backref count on 775761920 found 1 wanted 0 backpointer mismatch on [775761920 4096] Backref 775892992 parent 2 root 2 not found in extent tree Backref 775892992 root 2 not referenced back 0xa4283e8 Incorrect global backref count on 775892992 found 1 wanted 0 backpointer mismatch on [775892992 4096] Backref 775909376 parent 2 root 2 not found in extent tree Backref 775909376 root 2 not referenced back 0xa4284b8 Incorrect global backref count on 775909376 found 1 wanted 0 backpointer mismatch on [775909376 4096] Backref 775950336 parent 2 root 2 not found in extent tree Backref 775950336 root 2 not referenced back 0x86391d0 Incorrect global backref count on 775950336 found 1 wanted 0 backpointer mismatch on [775950336 4096] Backref 776458240 parent 2 root 2 not found in extent tree Backref 776458240 root 2 not referenced back 0xa428a68 Incorrect global backref count on 776458240 found 1 wanted 0 backpointer mismatch on [776458240 4096] Backref 776753152 parent 2 root 2 not found in extent tree Backref 776753152 root 2 not referenced back 0xa428728 Incorrect global backref count on 776753152 found 1 wanted 0 backpointer mismatch on [776753152 4096] Backref 776765440 parent 2 root 2 not found in extent tree Backref 776765440 root 2 not referenced back 0xa4288c8 Incorrect global backref count on 776765440 found 1 wanted 0 backpointer mismatch on [776765440 4096] Backref 776851456 parent 2 root 2 not found in extent tree Backref 776851456 root 2 not referenced back 0x8637b10 Incorrect global backref count on 776851456 found 1 wanted 0 backpointer mismatch on [776851456 4096] Backref 777175040 parent 2 root 2 not found in extent tree Backref 777175040 root 2 not referenced back 0x86392a0 Incorrect global backref count on 777175040 found 1 wanted 0 backpointer mismatch on [777175040 4096] Backref 777465856 parent 2 root 2 not found in extent tree Backref 777465856 root 2 not referenced back 0xa44bee8 Incorrect global backref count on 777465856 found 1 wanted 0 backpointer mismatch on [777465856 4096] Backref 777633792 parent 2 root 2 not found in extent tree Backref 777633792 root 2 not referenced back 0x8638330 Incorrect global backref count on 777633792 found 1 wanted 0 backpointer mismatch on [777633792 4096] Backref 778764288 parent 2 root 2 not found in extent tree Backref 778764288 root 2 not referenced back 0x8637ff0 Incorrect global backref count on 778764288 found 1 wanted 0 backpointer mismatch on [778764288 4096] Backref 778891264 parent 2 root 2 not found in extent tree Backref 778891264 root 2 not referenced back 0x8638740 Incorrect global backref count on 778891264 found 1 wanted 0 backpointer mismatch on [778891264 4096] Backref 779280384 parent 2 root 2 not found in extent tree Backref 779280384 root 2 not referenced back 0x8638e90 Incorrect global backref count on 779280384 found 1 wanted 0 backpointer mismatch on [779280384 4096] Backref 779673600 parent 2 root 2 not found in extent tree Backref 779673600 root 2 not referenced back 0xa428cd8 Incorrect global backref count on 779673600 found 1 wanted 0 backpointer mismatch on [779673600 4096] Backref 781275136 parent 2 root 2 not found in extent tree Backref 781275136 root 2 not referenced back 0xa429768 Incorrect global backref count on 781275136 found 1 wanted 0 backpointer mismatch on [781275136 4096] Backref 781406208 parent 2 root 2 not found in extent tree Backref 781406208 root 2 not referenced back 0xa517e80 Incorrect global backref count on 781406208 found 1 wanted 0 backpointer mismatch on [781406208 4096] Backref 782848000 parent 2 root 2 not found in extent tree Backref 782848000 root 2 not referenced back 0xa42c098 Incorrect global backref count on 782848000 found 1 wanted 0 backpointer mismatch on [782848000 4096] Backref 783228928 parent 2 root 2 not found in extent tree Backref 783228928 root 2 not referenced back 0xa429e78 Incorrect global backref count on 783228928 found 1 wanted 0 backpointer mismatch on [783228928 4096] Backref 783355904 parent 2 root 2 not found in extent tree Backref 783355904 root 2 not referenced back 0xa42a4f8 Incorrect global backref count on 783355904 found 1 wanted 0 backpointer mismatch on [783355904 4096] Backref 783376384 parent 2 root 2 not found in extent tree Backref 783376384 root 2 not referenced back 0xa42a698 Incorrect global backref count on 783376384 found 1 wanted 0 backpointer mismatch on [783376384 4096] Backref 783405056 parent 2 root 2 not found in extent tree Backref 783405056 root 2 not referenced back 0xa42a768 Incorrect global backref count on 783405056 found 1 wanted 0 backpointer mismatch on [783405056 4096] Backref 784048128 parent 2 root 2 not found in extent tree Backref 784048128 root 2 not referenced back 0xa429288 Incorrect global backref count on 784048128 found 1 wanted 0 backpointer mismatch on [784048128 4096] Backref 784093184 parent 2 root 2 not found in extent tree Backref 784093184 root 2 not referenced back 0xa429358 Incorrect global backref count on 784093184 found 1 wanted 0 backpointer mismatch on [784093184 4096] Backref 784101376 parent 2 root 2 not found in extent tree Backref 784101376 root 2 not referenced back 0xa4294f8 Incorrect global backref count on 784101376 found 1 wanted 0 backpointer mismatch on [784101376 4096] Backref 784330752 parent 2 root 2 not found in extent tree Backref 784330752 root 2 not referenced back 0xa426ab8 Incorrect global backref count on 784330752 found 1 wanted 0 backpointer mismatch on [784330752 4096] Backref 784388096 parent 2 root 2 not found in extent tree Backref 784388096 root 2 not referenced back 0xa42ab78 Incorrect global backref count on 784388096 found 1 wanted 0 backpointer mismatch on [784388096 4096] Backref 784637952 parent 2 root 2 not found in extent tree Backref 784637952 root 2 not referenced back 0xa42ac48 Incorrect global backref count on 784637952 found 1 wanted 0 backpointer mismatch on [784637952 4096] Backref 784650240 parent 2 root 2 not found in extent tree Backref 784650240 root 2 not referenced back 0xa42ade8 Incorrect global backref count on 784650240 found 1 wanted 0 backpointer mismatch on [784650240 4096] Backref 784830464 parent 2 root 2 not found in extent tree Backref 784830464 root 2 not referenced back 0x8637150 Incorrect global backref count on 784830464 found 1 wanted 0 backpointer mismatch on [784830464 4096] Backref 785096704 parent 2 root 2 not found in extent tree Backref 785096704 root 2 not referenced back 0xa42aeb8 Incorrect global backref count on 785096704 found 1 wanted 0 backpointer mismatch on [785096704 4096] Backref 785129472 parent 2 root 2 not found in extent tree Backref 785129472 root 2 not referenced back 0xa42aaa8 Incorrect global backref count on 785129472 found 1 wanted 0 backpointer mismatch on [785129472 4096] Backref 785215488 parent 2 root 2 not found in extent tree Backref 785215488 root 2 not referenced back 0xa42b058 Incorrect global backref count on 785215488 found 1 wanted 0 backpointer mismatch on [785215488 4096] Backref 786165760 parent 2 root 2 not found in extent tree Backref 786165760 root 2 not referenced back 0x81aada8 Incorrect global backref count on 786165760 found 1 wanted 0 backpointer mismatch on [786165760 4096] Backref 786239488 parent 2 root 2 not found in extent tree Backref 786239488 root 2 not referenced back 0xa44b528 Incorrect global backref count on 786239488 found 1 wanted 0 backpointer mismatch on [786239488 4096] Backref 786452480 parent 2 root 2 not found in extent tree Backref 786452480 root 2 not referenced back 0xa44be18 Incorrect global backref count on 786452480 found 1 wanted 0 backpointer mismatch on [786452480 4096] Backref 786620416 parent 2 root 2 not found in extent tree Backref 786620416 root 2 not referenced back 0xa42b468 Incorrect global backref count on 786620416 found 1 wanted 0 backpointer mismatch on [786620416 4096] Backref 786780160 parent 2 root 2 not found in extent tree Backref 786780160 root 2 not referenced back 0xa42b608 Incorrect global backref count on 786780160 found 1 wanted 0 backpointer mismatch on [786780160 4096] Backref 786808832 parent 2 root 2 not found in extent tree Backref 786808832 root 2 not referenced back 0xa503d68 Incorrect global backref count on 786808832 found 1 wanted 0 backpointer mismatch on [786808832 4096] Backref 786870272 parent 2 root 2 not found in extent tree Backref 786870272 root 2 not referenced back 0xa42b6d8 Incorrect global backref count on 786870272 found 1 wanted 0 backpointer mismatch on [786870272 4096] Backref 786874368 parent 2 root 2 not found in extent tree Backref 786874368 root 2 not referenced back 0xa42b7a8 Incorrect global backref count on 786874368 found 1 wanted 0 backpointer mismatch on [786874368 4096] Backref 787447808 parent 2 root 2 not found in extent tree Backref 787447808 root 2 not referenced back 0xa45a820 Incorrect global backref count on 787447808 found 1 wanted 0 backpointer mismatch on [787447808 4096] Backref 787599360 parent 2 root 2 not found in extent tree Backref 787599360 root 2 not referenced back 0xa42ca58 Incorrect global backref count on 787599360 found 1 wanted 0 backpointer mismatch on [787599360 4096] Backref 787660800 parent 2 root 2 not found in extent tree Backref 787660800 root 2 not referenced back 0xa42ba18 Incorrect global backref count on 787660800 found 1 wanted 0 backpointer mismatch on [787660800 4096] Backref 787668992 parent 2 root 2 not found in extent tree Backref 787668992 root 2 not referenced back 0xa42bae8 Incorrect global backref count on 787668992 found 1 wanted 0 backpointer mismatch on [787668992 4096] Backref 787922944 parent 2 root 2 not found in extent tree Backref 787922944 root 2 not referenced back 0x8638190 Incorrect global backref count on 787922944 found 1 wanted 0 backpointer mismatch on [787922944 4096] Backref 788348928 parent 2 root 2 not found in extent tree Backref 788348928 root 2 not referenced back 0x86385a0 Incorrect global backref count on 788348928 found 1 wanted 0 backpointer mismatch on [788348928 4096] Backref 788504576 parent 2 root 2 not found in extent tree Backref 788504576 root 2 not referenced back 0xa4e3460 Incorrect global backref count on 788504576 found 1 wanted 0 backpointer mismatch on [788504576 4096] Backref 788635648 parent 2 root 2 not found in extent tree Backref 788635648 root 2 not referenced back 0xa42bc88 Incorrect global backref count on 788635648 found 1 wanted 0 backpointer mismatch on [788635648 4096] Backref 788688896 parent 2 root 2 not found in extent tree Backref 788688896 root 2 not referenced back 0xa42be28 Incorrect global backref count on 788688896 found 1 wanted 0 backpointer mismatch on [788688896 4096] Backref 788709376 parent 2 root 2 not found in extent tree Backref 788709376 root 2 not referenced back 0xa42bef8 Incorrect global backref count on 788709376 found 1 wanted 0 backpointer mismatch on [788709376 4096] Backref 788717568 parent 2 root 2 not found in extent tree Backref 788717568 root 2 not referenced back 0xa42bfc8 Incorrect global backref count on 788717568 found 1 wanted 0 backpointer mismatch on [788717568 4096] Backref 790511616 parent 2 root 2 not found in extent tree Backref 790511616 root 2 not referenced back 0xa44bd48 Incorrect global backref count on 790511616 found 1 wanted 0 backpointer mismatch on [790511616 4096] Backref 790540288 parent 2 root 2 not found in extent tree Backref 790540288 root 2 not referenced back 0xa42c988 Incorrect global backref count on 790540288 found 1 wanted 0 backpointer mismatch on [790540288 4096] Backref 790740992 parent 2 root 2 not found in extent tree Backref 790740992 root 2 not referenced back 0x86389b0 Incorrect global backref count on 790740992 found 1 wanted 0 backpointer mismatch on [790740992 4096] Backref 790753280 parent 2 root 2 not found in extent tree Backref 790753280 root 2 not referenced back 0x8638cf0 Incorrect global backref count on 790753280 found 1 wanted 0 backpointer mismatch on [790753280 4096] Backref 792076288 parent 2 root 2 not found in extent tree Backref 792076288 root 2 not referenced back 0x86384d0 Incorrect global backref count on 792076288 found 1 wanted 0 backpointer mismatch on [792076288 4096] Backref 793780224 parent 2 root 2 not found in extent tree Backref 793780224 root 2 not referenced back 0x81baea8 Incorrect global backref count on 793780224 found 1 wanted 0 backpointer mismatch on [793780224 4096] Backref 793821184 parent 2 root 2 not found in extent tree Backref 793821184 root 2 not referenced back 0x8637080 Incorrect global backref count on 793821184 found 1 wanted 0 backpointer mismatch on [793821184 4096] Backref 793853952 parent 2 root 2 not found in extent tree Backref 793853952 root 2 not referenced back 0x86378a0 Incorrect global backref count on 793853952 found 1 wanted 0 backpointer mismatch on [793853952 4096] ref mismatch on [31078330368 20480] extent item 1, found 0 repair deleting extent record: key 31078330368 168 20480 Incorrect local backref count on 31078330368 root 5 owner 3024239 offset 0 found 0 wanted 1 back 0x8f00580 backpointer mismatch on [31078330368 20480] owner ref check failed [31078330368 20480] repaired damaged extent references checking fs roots checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3 Csum didn''t match Regards, Frederic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER:> Hi,Hi Frédéric,> I''m using a Linux 3.7.6 (Gentoo Linux) with btrfs-progs-0.20_rc1_p56 and since few days, I have some uncorrectable errors : > > # btrfs scrub status / > scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3 > scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds > total bytes scrubbed: 31.02GB with 6 errors > error details: csum=6 > corrected errors: 0, uncorrectable errors: 6, unverified errors: 0 > > I don''t know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk.This disk is still fine? Is smartctl -a happy with it?> I have discovered this problem thanks to several errors in dmesg when I try to access to a file : > > [ 2985.163718] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0 > [ 2985.169191] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0[…]> If I restart a btrfs scrub, I get these messages : > > [ 3047.835131] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5 > [ 3047.835134] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5 > [ 3047.835137] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0 > [ 3047.953751] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2[…]> I tried a LiveCD to make a btrfsck [I have to check its version] but it segfaults during the test. > > Today, I can''t remove the file (and I can''t delete its directory), updatedb runs during hours when it tries to read this file. > So, what is the best way to recover these errors (as I think that some files are definitely lost) ? > I would like to identify the corrupted files and to delete them.I thought that with recent kernels BTRFS would report the file which is affected, but here it doesn´t seem so. I think its also possibe to find out the file from the block number. But I do not remember the direct way to do it. I only know the other way around with filefrag -v or hdparm --fibmap - well actually file thinking on it, vice versa needs to have knowledge of filesystem structure… Maybe its possible to map something in the output in btrfs-debug-tree to above output. But I really think BTRFS displays the filename affected meanwhile. So maybe if it does not, its some metadata being affected? So output of btrfsck hints at that and that you can´t remove the file does as well. What happens if you try to remove the file? Do you get an input/output error or something like that? Maybe someone else can help with that. Aside from that: Thats uncorrectable errors for a reason :) Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER:> Today, I can''t remove the file (and I can''t delete its directory), > updatedb runs during hours when it tries to read this file. So, what is > the best way to recover these errors (as I think that some files are > definitely lost) ? I would like to identify the corrupted files and to > delete them.Well, if nothing else works, you can still make a backup, diff it with an older backup to possible recover the corrupted files or at least older versions of it and redo the filesystem. After verify that the hardware works okay :) As said, these errors are called uncorrectable for a reason. When they happen on file data it should be possible to delete the offending file, but then AFAIK BTRFS also reports on which file they happen. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Martin, Thank you for your reply. Le samedi 16 mars 2013 19:16:54 Martin Steigerwald a écrit :> Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER: > > # btrfs scrub status / > > scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3 > > scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds > > total bytes scrubbed: 31.02GB with 6 errors > > error details: csum=6 > > corrected errors: 0, uncorrectable errors: 6, unverified errors: 0 > > > > I don''t know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk. > > This disk is still fine? Is smartctl -a happy with it?It is old but it seems to be fine : 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 20238 ... 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 ... SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15811 - # 2 Short offline Aborted by host 20% 13984 - # 3 Short offline Completed without error 00% 13984 - # 4 Short offline Completed without error 00% 187 -> > Today, I can''t remove the file (and I can''t delete its directory), updatedb runs during hours when it tries to read this file. > > So, what is the best way to recover these errors (as I think that some files are definitely lost) ? > > I would like to identify the corrupted files and to delete them. > > I thought that with recent kernels BTRFS would report the file which is > affected, but here it doesn´t seem so.Yes, I read on a mailing list that a patch was proposed but with 3.8.1, it doesn''t work.> I think its also possibe to find out the file from the block number. But I > do not remember the direct way to do it. I only know the other way around > with filefrag -v or hdparm --fibmap - well actually file thinking on it, > vice versa needs to have knowledge of filesystem structure… Maybe its > possible to map something in the output in btrfs-debug-tree to above output.In fact, yesterday, I make an rsync from btrfs to ext4 and rsync has reported "Stale NFS handle errors" for these files. So, now there are now longer problem about that. The most annoying thing is that we can''t delete these files. So, the only way to solve these problems is to replace the filesystem.> But I really think BTRFS displays the filename affected meanwhile. So > maybe if it does not, its some metadata being affected? So output of btrfsck > hints at that and that you can´t remove the file does as well. What happens > if you try to remove the file? Do you get an input/output error or > something like that?# rm -rf * rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle ...> Maybe someone else can help with that. > > Aside from that: Thats uncorrectable errors for a reason :)Yes, I absolutely agree that we can''t recover some files but btrfsck sould propose to recover these error (like fsck.ext4) even if we loose some data. In fact, I never got this kind of problem with ext filesystems. Regards, Frederic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mar 20, 2013, at 7:33 AM, Frédéric COIFFIER <frederic.coiffier@free.fr> wrote:> > 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 > 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0With such high ECC recovered events, I suspect SDC. The value is in manufacturer''s tolerance to not fail the drive outright, but the ECC in a consumer SATA drive isn''t fool proof. It will fail to detect some errors, and report bad data back to the file system. It will detect and incorrectly "correct" others. Even if most error is detected and correctly corrected, bottom line is you have a file system that knows better and it''s saying something is significantly wrong. If you''re going to continue to use the drive, I would at least use hdparm to issue ATA enhanced security erase unit. Then I''d take a smartctl -x capture for reference. Then do an extended offline smart test with -t long, which this drive has never had in its lifetime. And another smartctl -x to compare to the reference and see if either the test completed or failed, and whether any of the attributes changed appreciably during the offline test. Otherwise get a replacement. The one off UDMA error isn''t a media error, but communication between drive and controller, I wouldn''t be overly concerned with that.> The most annoying thing is that we can''t delete these files. So, the only way to solve these problems is to replace the filesystem.The storage media isn''t reliable. Replacing the file system eventually will get you right back where you are now, except in a case of multiple devices with a reliable 2nd device. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER:> > But I really think BTRFS displays the filename affected meanwhile. So > > maybe if it does not, its some metadata being affected? So output of btrfsck > > hints at that and that you can´t remove the file does as well. What happens > > if you try to remove the file? Do you get an input/output error or > > something like that? > > # rm -rf * > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > ...You are trying to remove the files from an NFS client. Stale NFS file handle just means that the NFS handle is no longer valid. NFS <v4 clients refer to file by a file handle composed of filesystem id and inode number. Maybe a change in there? Anyway, to find the real error message its necessary to try to delete the files on the server. Cause even if there is a real BTRFS issue, the NFS client likely won´t report helpful error messages. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> # rm -rf * >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle >> ... > > You are trying to remove the files from an NFS client. Stale NFS file > handle just means that the NFS handle is no longer valid. NFS <v4 > clients refer to file by a file handle composed of filesystem id and > inode number. Maybe a change in there? > > Anyway, to find the real error message its necessary to try to delete > the files on the server. Cause even if there is a real BTRFS issue, the > NFS client likely won´t report helpful error messages.Don''t read too much into that "Stale NFS file handle" message; ESTALE doesn''t imply anything about NFS being involved, despite the standard error string for that value. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Mar 2013 12:19:18 -0600 Chris Murphy <lists@colorremedies.com> wrote:> > 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940> With such high ECC recovered events, I suspect SDC.If it''s a Seagate drive, this is absolutely normal. All Seagate drives have a high value in SMART Hardware_ECC_Recovered. -- With respect, Roman
On Mar 20, 2013, at 1:24 PM, Roman Mamedov <rm@romanrm.ru> wrote:> On Wed, 20 Mar 2013 12:19:18 -0600 > Chris Murphy <lists@colorremedies.com> wrote: > >>> 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940 > >> With such high ECC recovered events, I suspect SDC. > > If it''s a Seagate drive, this is absolutely normal. > All Seagate drives have a high value in SMART Hardware_ECC_Recovered.http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/Seagate-s-Seek-Error-Rate-Raw-Read-Error-Rate-and-Hardware-ECC/td-p/122382 http://www.silentpcreview.com/forums/viewtopic.php?t=57212 If I read this correctly, the read error rate and hardware ECC recovered are sector counts, so they should be the same. Nevertheless, the file system isn''t happy about checksums. It''s not that it isn''t finding the checksum data, it''s finding errors with it. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Martin, Le mercredi 20 mars 2013 19:59:54 Martin Steigerwald a écrit :> Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER: > > > But I really think BTRFS displays the filename affected meanwhile. So > > > maybe if it does not, its some metadata being affected? So output of btrfsck > > > hints at that and that you can´t remove the file does as well. What happens > > > if you try to remove the file? Do you get an input/output error or > > > something like that? > > > > # rm -rf * > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > ... > > You are trying to remove the files from an NFS client. Stale NFS file > handle just means that the NFS handle is no longer valid.Absolutely not. I''m not using NFS and I try to remove the files locally. It seems that btrfs returns a strange ESTALE errno... grep -rsn ESTALE fs/btrfs/ fs/btrfs/inode.c:2412: if (ret && ret != -ESTALE) fs/btrfs/inode.c:2415: if (ret == -ESTALE && root == root->fs_info->tree_root) { fs/btrfs/inode.c:2451: if (ret == -ESTALE) { fs/btrfs/inode.c:4273: inode = ERR_PTR(-ESTALE); fs/btrfs/export.c:71: return ERR_PTR(-ESTALE); fs/btrfs/export.c:104: return ERR_PTR(-ESTALE); This error seems to be common (even if I can''t see any recent reports) : http://www.google.fr/search?q=btrfs+Stale+NFS+file+handle Regards, Frederic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Roman, Le jeudi 21 mars 2013 01:24:14 Roman Mamedov a écrit :> On Wed, 20 Mar 2013 12:19:18 -0600 > Chris Murphy <lists@colorremedies.com> wrote: > > > > 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940 > > > With such high ECC recovered events, I suspect SDC. > > If it''s a Seagate drive, this is absolutely normal. > All Seagate drives have a high value in SMART Hardware_ECC_Recovered.You''re right : it''s a Seagate : Model Family: Seagate Barracuda 7200.10 Device Model: ST3320620AS Regards, Frederic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Donnerstag, 21. März 2013 schrieb Frédéric COIFFIER:> Hi Martin, > > Le mercredi 20 mars 2013 19:59:54 Martin Steigerwald a écrit : > > Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER: > > > > But I really think BTRFS displays the filename affected meanwhile. So > > > > maybe if it does not, its some metadata being affected? So output of btrfsck > > > > hints at that and that you can´t remove the file does as well. What happens > > > > if you try to remove the file? Do you get an input/output error or > > > > something like that? > > > > > > # rm -rf * > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > rm: cannot remove ''drivers/misc/lis3lv02d/lis3lv02d.c'': Stale NFS file handle > > > ... > > > > You are trying to remove the files from an NFS client. Stale NFS file > > handle just means that the NFS handle is no longer valid. > > Absolutely not. I''m not using NFS and I try to remove the files locally. > It seems that btrfs returns a strange ESTALE errno... > > grep -rsn ESTALE fs/btrfs/ > fs/btrfs/inode.c:2412: if (ret && ret != -ESTALE) > fs/btrfs/inode.c:2415: if (ret == -ESTALE && root == root->fs_info->tree_root) { > fs/btrfs/inode.c:2451: if (ret == -ESTALE) { > fs/btrfs/inode.c:4273: inode = ERR_PTR(-ESTALE); > fs/btrfs/export.c:71: return ERR_PTR(-ESTALE); > fs/btrfs/export.c:104: return ERR_PTR(-ESTALE); > > This error seems to be common (even if I can''t see any recent reports) : > http://www.google.fr/search?q=btrfs+Stale+NFS+file+handleThanks for notice. Well I thought one can take the error message literally. I only ever saw it with NFS and its also NFS in the error message. I think the error message is at least misleading. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mar 21, 2013, at 2:57 AM, Frédéric COIFFIER <frederic.coiffier@free.fr> wrote:> Hi Roman, > > Le jeudi 21 mars 2013 01:24:14 Roman Mamedov a écrit : >> On Wed, 20 Mar 2013 12:19:18 -0600 >> Chris Murphy <lists@colorremedies.com> wrote: >> >>>> 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940 >> >>> With such high ECC recovered events, I suspect SDC. >> >> If it''s a Seagate drive, this is absolutely normal. >> All Seagate drives have a high value in SMART Hardware_ECC_Recovered. > > You''re right : it''s a Seagate :Your first post, btrfs scrub, contains checksum errors in metadata. It reports two logical values, at four sector values. So that tells me this is metadata profile raid1. And because this isn''t a fixable error, it sounds like the mirrored metadata agree with each other, but the data itself has changed. I don''t think that''s due to a reset or powerloss during a write. The source of the problem sounds to me like SDC. Some parts of the drive have bad sectors and the drive is returning the wrong data, and the FS knows this. previously:> Yes, I absolutely agree that we can''t recover some files but btrfsck sould propose to recover these error (like fsck.ext4) even if we loose some data. > In fact, I never got this kind of problem with ext filesystems.It''s not a fair comparison. ext is stable. btrfs is not. ext''s fsck repairs by default, btrfs''s does not. There are no suggestions users ask devs on a list before running fsck repair on ext, but that is the case for btrfs. So far no dev has suggested using the --repair flag. I don''t know whether this would help get the file system to allow the deletion of corrupt files. There have been many changes since kernel 3.7.4 so I suspect a dev would want you to try something newer, and also much newer progs as well. In any case, I would still use enhanced security erase on the drive, and then do a smartctl -t long (extended offline) test, and then ensure it completes after the estimated time with smartctl -a or -x. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html