Hello, On saturday I added another disk to my BTRFS filesystem. I started a rebalance to convert it from m:DUP/d:single to m:RAID1/d:RAID1. I quickly noticed it started filling my logs with: "btrfs: block rsv returned -28", and "slowpath" warnings from "use_block_rsv+0x198/0x1a0 [btrfs]" (http://pastebin.com/HF6u3g31). It was also seemingly stuck. After around 2 hours with no progress at all from "balance status" command, I went to #btrfs IRC channel to ask what should I do. I''ve been told to cancel it, I run "balance cancel" but it was stuck too. Then I noticed from "fi df" output, that metadata DUP usage is slowly going down, while RAID1 is slowly going up. Very slowly. So I waited. Finally cancel worked. I decided to resume the conversion (adding "soft" to the command like this: "balance start -mconvert=raid1,soft -dconvert=raid1,soft"), and leave it working over night. On sunday balance suddenly stopped, but it wasn''t finished. Turns out, it run out of space, due to metadata total space exploding from less than 7 GB to above 50GB: Data, RAID1: total=395.96GB, used=395.82GB Data: total=8.00MB, used=8.00MB System, DUP: total=8.00MB, used=72.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=51.50GB, used=6.35GB Metadata, DUP: total=1.00GB, used=501.86MB Metadata: total=8.00MB, used=0.00 There were also some worrying messages in the log: http://pastebin.com/ceka12NM. I rebooted my computer and the balance started continuing its work by itself. After a while it stopped again. No messages it the log, but it didn''t finish either. I started it again, and after a while the command stopped with "No such file or directory" error. Started again, same error. In the log there''s only: [83690.889986] btrfs: relocating block group 29360128 flags 36 [87480.359914] btrfs: relocating block group 29360128 flags 36 [88893.850409] btrfs: relocating block group 29360128 flags 36 I unmounted the FS and run btrfsck. It found some extent errors: checking extents ref mismatch on [711069696 4096] extent item 1, found 0 Backref 711069696 root 8 not referenced back 0x1e6d0590 Incorrect global backref count on 711069696 found 1 wanted 0 backpointer mismatch on [711069696 4096] owner ref check failed [711069696 4096] ref mismatch on [848388096 4096] extent item 1, found 0 Backref 848388096 root 8 not referenced back 0x36311b90 Incorrect global backref count on 848388096 found 1 wanted 0 backpointer mismatch on [848388096 4096] owner ref check failed [848388096 4096] Errors found in extent allocation tree ...and a lot of these errors: checking fs roots root 823 inode 222165 errors 400 root 823 inode 390623 errors 400 root 838 inode 1261335 errors 400 [...] Full error log here: http://pastebin.com/HyjmWBNA What should I do next? I''d like to repair it in place if possible. The FS contains mostly daily backups, not important virtual machine images, Steam with games etc. Repairing it would save me redownloading gigabytes of data over the internet (I can just run my next rsync backups with "--checksum", verify my Steam game files, and that''s it), or looking for another hard disk to copy it somewhere. Regards -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
As I was in a hurry, I forgot about some things:> I rebooted my computer and the balance started continuing its workOf course I deleted around 15GB of data to free some space after noticing there is no space left, then tried to restart balance, it didn''t work, checked logs, noticed problems and rebooted.> I unmounted the FS and run btrfsck.I also run scrub before, it haven''t found any errors. My kernel is from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc7-raring/ The file system may have been corrupted few weeks earlier, as I enabled qgroups to test how they work, but soon started getting strange memory allocation failures from google-chrome, from btrfs itself, while trying to hibernate, while using virtualbox, and some hard lock-ups too. I disabled qgroups and everything went back to normal. I can dig up some kernel logs when I get back home. Regards -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
When I try btrfsck in repair mode, it fails to fix the corruption (log below). Is there any other version of btrfsck besides the one at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git that I could try? (gdb) run Starting program: /home/pp/btrfs-progs/btrfsck --repair /dev/mapper/pp-dysk4 enabling repair mode ERROR: unable to scan the device ''/dev/sda7'' - Device or resource busy ERROR: unable to scan the device ''/dev/sdc7'' - Device or resource busy ERROR: unable to scan the device ''/dev/sda7'' - Device or resource busy ERROR: unable to scan the device ''/dev/sdc7'' - Device or resource busy checking extents ref mismatch on [711069696 4096] extent item 1, found 0 btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)'' failed. Program received signal SIGABRT, Aborted. 0x00007ffff784c425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: Nie ma takiego pliku ani katalogu. (gdb) bt #0 0x00007ffff784c425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007ffff784fb8b in __GI_abort () at abort.c:91 #2 0x00007ffff78450ee in __assert_fail_base (fmt=<optimized out>, assertion=0x43d21e "!(ret)", file=0x43d210 "extent-tree.c", line=<optimized out>, function=<optimized out>) at assert.c:94 #3 0x00007ffff7845192 in __GI___assert_fail (assertion=0x43d21e "!(ret)", file=0x43d210 "extent-tree.c", line=2549, function=0x43d870 <__PRETTY_FUNCTION__.7967> "btrfs_reserve_extent") at assert.c:103 #4 0x000000000041ef7d in btrfs_reserve_extent (trans=0x64fed0, root=0x64f6e0, num_bytes=4096, empty_size=0, hint_byte=433791696896, search_end=18446744073709551615, ins=0x7fffffffdd00, data=52) at extent-tree.c:2549 #5 0x000000000041f218 in alloc_tree_block (trans=0x64fed0, root=0x64f6e0, num_bytes=4096, root_objectid=2, generation=289502, flags=0, key=0x7fffffffdd80, level=3, empty_size=0, hint_byte=433791696896, search_end=18446744073709551615, ins=0x7fffffffdd00) at extent-tree.c:2612 #6 0x000000000041f426 in btrfs_alloc_free_block (trans=0x64fed0, root=0x64f6e0, blocksize=4096, root_objectid=2, key=0x7fffffffdd80, level=3, hint=433791696896, empty_size=0) at extent-tree.c:2658 #7 0x000000000040d504 in __btrfs_cow_block (trans=0x64fed0, root=0x64f6e0, buf=0x6749c0, parent=0x0, parent_slot=0, cow_ret=0x7fffffffde68, search_start=433791696896, empty_size=0) at ctree.c:321 #8 0x000000000040d950 in btrfs_cow_block (trans=0x64fed0, root=0x64f6e0, buf=0x6749c0, parent=0x0, parent_slot=0, cow_ret=0x7fffffffde68) at ctree.c:410 #9 0x000000000040f464 in btrfs_search_slot (trans=0x64fed0, root=0x64f6e0, key=0x7fffffffdec0, p=0x30409b20, ins_len=0, cow=1) at ctree.c:1214 #10 0x000000000040a246 in delete_extent_records (trans=0x64fed0, root=0x64f6e0, path=0x30409b20, bytenr=711069696, new_len=4096) at btrfsck.c:2858 #11 0x000000000040aa2d in fixup_extent_refs (trans=0x64fed0, info=0x6513e0, rec=0xf7de760) at btrfsck.c:3078 #12 0x000000000040b1a3 in check_extent_refs (trans=0x64fed0, root=0xa0fa70, extent_cache=0x7fffffffe080, repair=1) at btrfsck.c:3317 #13 0x000000000040b89f in check_extents (trans=0x64fed0, root=0xa0fa70, repair=1) at btrfsck.c:3461 #14 0x000000000040bcd7 in main (ac=1, av=0x7fffffffe4d8) at btrfsck.c:3573 (gdb) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html