thr3ads.net - Btrfs devel - various problems converting to RAID1 [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Piotr Pawłow

2012-Dec-04 08:50 UTC

various problems converting to RAID1

Hello,

On saturday I added another disk to my BTRFS filesystem. I started a 
rebalance to convert it from m:DUP/d:single to m:RAID1/d:RAID1.

I quickly noticed it started filling my logs with: "btrfs: block rsv 
returned -28", and "slowpath" warnings from
"use_block_rsv+0x198/0x1a0
[btrfs]" (http://pastebin.com/HF6u3g31).

It was also seemingly stuck. After around 2 hours with no progress at 
all from "balance status" command, I went to #btrfs IRC channel to ask
what should I do. I''ve been told to cancel it, I run "balance
cancel"
but it was stuck too. Then I noticed from "fi df" output, that
metadata
DUP usage is slowly going down, while RAID1 is slowly going up. Very 
slowly. So I waited. Finally cancel worked.

I decided to resume the conversion (adding "soft" to the command like 
this: "balance start -mconvert=raid1,soft -dconvert=raid1,soft"), and 
leave it working over night.

On sunday balance suddenly stopped, but it wasn''t finished. Turns out, 
it run out of space, due to metadata total space exploding from less 
than 7 GB to above 50GB:

Data, RAID1: total=395.96GB, used=395.82GB
Data: total=8.00MB, used=8.00MB
System, DUP: total=8.00MB, used=72.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=51.50GB, used=6.35GB
Metadata, DUP: total=1.00GB, used=501.86MB
Metadata: total=8.00MB, used=0.00

There were also some worrying messages in the log: 
http://pastebin.com/ceka12NM.

I rebooted my computer and the balance started continuing its work by 
itself. After a while it stopped again. No messages it the log, but it 
didn''t finish either.

I started it again, and after a while the command stopped with "No such 
file or directory" error. Started again, same error. In the log
there''s
only:

[83690.889986] btrfs: relocating block group 29360128 flags 36
[87480.359914] btrfs: relocating block group 29360128 flags 36
[88893.850409] btrfs: relocating block group 29360128 flags 36

I unmounted the FS and run btrfsck. It found some extent errors:

checking extents
ref mismatch on [711069696 4096] extent item 1, found 0
Backref 711069696 root 8 not referenced back 0x1e6d0590
Incorrect global backref count on 711069696 found 1 wanted 0
backpointer mismatch on [711069696 4096]
owner ref check failed [711069696 4096]
ref mismatch on [848388096 4096] extent item 1, found 0
Backref 848388096 root 8 not referenced back 0x36311b90
Incorrect global backref count on 848388096 found 1 wanted 0
backpointer mismatch on [848388096 4096]
owner ref check failed [848388096 4096]
Errors found in extent allocation tree

...and a lot of these errors:

checking fs roots
root 823 inode 222165 errors 400
root 823 inode 390623 errors 400
root 838 inode 1261335 errors 400
[...]

Full error log here: http://pastebin.com/HyjmWBNA

What should I do next? I''d like to repair it in place if possible. The 
FS contains mostly daily backups, not important virtual machine images, 
Steam with games etc. Repairing it would save me redownloading gigabytes 
of data over the internet (I can just run my next rsync backups with 
"--checksum", verify my Steam game files, and that''s it), or
looking for
another hard disk to copy it somewhere.

Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Piotr Pawłow

2012-Dec-04 10:22 UTC

head link

Re: various problems converting to RAID1

As I was in a hurry, I forgot about some things:
> I rebooted my computer and the balance started continuing its work
Of course I deleted around 15GB of data to free some space after noticing
there is no space left, then tried to restart balance, it didn''t work,
checked logs, noticed problems and rebooted.
> I unmounted the FS and run btrfsck.
I also run scrub before, it haven''t found any errors.

My kernel is from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc7-raring/

The file system may have been corrupted few weeks earlier, as I enabled
qgroups to test how they work, but soon started getting strange memory
allocation failures from google-chrome, from btrfs itself, while trying to
hibernate, while using virtualbox, and some hard lock-ups too. I disabled
qgroups and everything went back to normal. I can dig up some kernel logs
when I get back home.

Regards

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Piotr Pawłow

2012-Dec-06 08:16 UTC

head link

Re: various problems converting to RAID1

When I try btrfsck in repair mode, it fails to fix the corruption (log 
below). Is there any other version of btrfsck besides the one at 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git that 
I could try?

(gdb) run
Starting program: /home/pp/btrfs-progs/btrfsck --repair /dev/mapper/pp-dysk4
enabling repair mode
ERROR: unable to scan the device ''/dev/sda7'' - Device or
resource busy
ERROR: unable to scan the device ''/dev/sdc7'' - Device or
resource busy
ERROR: unable to scan the device ''/dev/sda7'' - Device or
resource busy
ERROR: unable to scan the device ''/dev/sdc7'' - Device or
resource busy
checking extents
ref mismatch on [711069696 4096] extent item 1, found 0
btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)'' 
failed.

Program received signal SIGABRT, Aborted.
0x00007ffff784c425 in __GI_raise (sig=<optimized out>) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64    ../nptl/sysdeps/unix/sysv/linux/raise.c: Nie ma takiego pliku ani 
katalogu.
(gdb) bt
#0  0x00007ffff784c425 in __GI_raise (sig=<optimized out>) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff784fb8b in __GI_abort () at abort.c:91
#2  0x00007ffff78450ee in __assert_fail_base (fmt=<optimized out>, 
assertion=0x43d21e "!(ret)",
     file=0x43d210 "extent-tree.c", line=<optimized out>, 
function=<optimized out>) at assert.c:94
#3  0x00007ffff7845192 in __GI___assert_fail (assertion=0x43d21e 
"!(ret)", file=0x43d210 "extent-tree.c", line=2549,
     function=0x43d870 <__PRETTY_FUNCTION__.7967> 
"btrfs_reserve_extent") at assert.c:103
#4  0x000000000041ef7d in btrfs_reserve_extent (trans=0x64fed0, 
root=0x64f6e0, num_bytes=4096, empty_size=0,
     hint_byte=433791696896, search_end=18446744073709551615, 
ins=0x7fffffffdd00, data=52) at extent-tree.c:2549
#5  0x000000000041f218 in alloc_tree_block (trans=0x64fed0, 
root=0x64f6e0, num_bytes=4096, root_objectid=2,
     generation=289502, flags=0, key=0x7fffffffdd80, level=3, 
empty_size=0, hint_byte=433791696896,
     search_end=18446744073709551615, ins=0x7fffffffdd00) at 
extent-tree.c:2612
#6  0x000000000041f426 in btrfs_alloc_free_block (trans=0x64fed0, 
root=0x64f6e0, blocksize=4096, root_objectid=2,
     key=0x7fffffffdd80, level=3, hint=433791696896, empty_size=0) at 
extent-tree.c:2658
#7  0x000000000040d504 in __btrfs_cow_block (trans=0x64fed0, 
root=0x64f6e0, buf=0x6749c0, parent=0x0, parent_slot=0,
     cow_ret=0x7fffffffde68, search_start=433791696896, empty_size=0) at 
ctree.c:321
#8  0x000000000040d950 in btrfs_cow_block (trans=0x64fed0, 
root=0x64f6e0, buf=0x6749c0, parent=0x0, parent_slot=0,
     cow_ret=0x7fffffffde68) at ctree.c:410
#9  0x000000000040f464 in btrfs_search_slot (trans=0x64fed0, 
root=0x64f6e0, key=0x7fffffffdec0, p=0x30409b20, ins_len=0,
     cow=1) at ctree.c:1214
#10 0x000000000040a246 in delete_extent_records (trans=0x64fed0, 
root=0x64f6e0, path=0x30409b20, bytenr=711069696,
     new_len=4096) at btrfsck.c:2858
#11 0x000000000040aa2d in fixup_extent_refs (trans=0x64fed0, 
info=0x6513e0, rec=0xf7de760) at btrfsck.c:3078
#12 0x000000000040b1a3 in check_extent_refs (trans=0x64fed0, 
root=0xa0fa70, extent_cache=0x7fffffffe080, repair=1)
     at btrfsck.c:3317
#13 0x000000000040b89f in check_extents (trans=0x64fed0, root=0xa0fa70, 
repair=1) at btrfsck.c:3461
#14 0x000000000040bcd7 in main (ac=1, av=0x7fffffffe4d8) at btrfsck.c:3573
(gdb)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Dec 2012 - various problems converting to RAID1

various problems converting to RAID1

Re: various problems converting to RAID1

Re: various problems converting to RAID1