thr3ads.net - Btrfs devel - Errors not found by btrfsck or scrub [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Chris Carlin

2013-Jan-11 18:13 UTC

Errors not found by btrfsck or scrub

I have a week-old filesystem that is reported clean by btrfsck and
scrub, but that fails under operations ranging from du to sync and
umount (but no failures if mounted readonly).

My problem sounds similar to a few other reports (e.g. TM''s in
http://thread.gmane.org/gmane.comp.file-systems.btrfs/22014 ) that
seem to hint at problems with full metadata. My df shows:

# btrfs fi df /mnt/btrfs
Data, RAID0: total=776.32GB, used=717.56GB
Data: total=81.00GB, used=29.44GB
System, DUP: total=8.00MB, used=72.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=512.00MB, used=511.60MB
Metadata, DUP: total=1.00GB, used=1022.39MB

That looks suspicious to me, both the 1GB vs 1022MB and that there is
both DUP and RAID1 metadata. The balance operation I ran after adding
a second device finished without errors; could it have actually
failed? At this point balance DOES fail (locks up) every time...

This computer is Ubuntu, but I''ve updated to the latest kernel and
btrfs-tools I could find, and the problems remain.

Below is what showed up in dmesg during the run of scrub. Most of the
time the error is "btrfs: block rsv returned -28", but the aborted
transaction and auto-ro is always there.

Anything I can do to help identify a bug here? Clearly one problem is
that the filesystem checking tools can''t find anything wrong, much
less fix the filesystem.

~Chris

scrub:
[12208.367036] btrfs: run_one_delayed_ref returned -28
[12208.367049] ------------[ cut here ]------------
[12208.367152] WARNING: at /home/apw/COD/linux/fs/btrfs/super.c:221
__btrfs_abort_transaction+0x99/0xb0 [btrfs]()
[12208.367155] Hardware name: KT600-8237
[12208.367158] btrfs: Transaction aborted
[12208.367161] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs
msdos jfs xfs reiserfs ext2 i2c_viapro serio_raw matrox_w1 wire
w83627hf hwmon_vid shpchp mac_hid lp parport btrfs zlib_deflate
libcrc32c hid_generic usbhid hid sata_via pata_via sata_sil r8169
[12208.367199] Pid: 1955, comm: btrfs-transacti Not tainted
3.5.7-03050702-generic #201212170935
[12208.367201] Call Trace:
[12208.367218]  [<c1045a52>] warn_slowpath_common+0x72/0xa0
[12208.367243]  [<e0960fa9>] ? __btrfs_abort_transaction+0x99/0xb0 [btrfs]
[12208.367267]  [<e0960fa9>] ? __btrfs_abort_transaction+0x99/0xb0 [btrfs]
[12208.367273]  [<c1045b23>] warn_slowpath_fmt+0x33/0x40
[12208.367297]  [<e0960fa9>] __btrfs_abort_transaction+0x99/0xb0 [btrfs]
[12208.367329]  [<e0976c0e>] btrfs_run_delayed_refs+0x29e/0x2e0 [btrfs]
[12208.367362]  [<e098763a>] btrfs_commit_transaction+0x3aa/0x8b0 [btrfs]
[12208.367371]  [<c1066eb0>] ? add_wait_queue+0x50/0x50
[12208.367400]  [<e0988208>] ? start_transaction+0x38/0x50 [btrfs]
[12208.367427]  [<e0981b1d>] transaction_kthread+0x1ed/0x260 [btrfs]
[12208.367437]  [<c107066e>] ? complete+0x4e/0x60
[12208.367464]  [<e0981930>] ?
btrfs_destroy_delayed_refs.isra.86+0x1c0/0x1c0 [btrfs]
[12208.367470]  [<c1066812>] kthread+0x72/0x80
[12208.367475]  [<c10667a0>] ? flush_kthread_worker+0x90/0x90
[12208.367484]  [<c15e5d3e>] kernel_thread_helper+0x6/0x10
[12208.367488] ---[ end trace d44c7f4de69ddd30 ]---
[12208.367493] BTRFS error (device sdb1) in
btrfs_run_delayed_refs:2455: error 28
[12208.367504] btrfs is forced readonly
[12208.367510] BTRFS warning (device sdb1): Skipping commit of aborted
transaction.
[12208.367515] BTRFS error (device sdb1) in cleanup_transaction:1226: error 28
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2013-Jan-11 18:54 UTC

head link

Re: Errors not found by btrfsck or scrub

On Fri, Jan 11, 2013 at 01:13:24PM -0500, Chris Carlin
wrote:> I have a week-old filesystem that is reported clean by btrfsck and
> scrub, but that fails under operations ranging from du to sync and
> umount (but no failures if mounted readonly).
> 
> My problem sounds similar to a few other reports (e.g. TM''s in
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/22014 ) that
> seem to hint at problems with full metadata. My df shows:
> 
> # btrfs fi df /mnt/btrfs
> Data, RAID0: total=776.32GB, used=717.56GB
> Data: total=81.00GB, used=29.44GB
> System, DUP: total=8.00MB, used=72.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=512.00MB, used=511.60MB
> Metadata, DUP: total=1.00GB, used=1022.39MB
> 
> That looks suspicious to me, both the 1GB vs 1022MB and that there is
   1GB in this output is 1024 MiB (i.e. it''s actually 1 GiB, not 1
GB), so it''s not screwed-up accounting, just confusing reporting.
> both DUP and RAID1 metadata. The balance operation I ran after adding
> a second device finished without errors; could it have actually
> failed? At this point balance DOES fail (locks up) every time...
   The balance probably did half the DUP -> RAID-1 conversion of your
metadata and then had its problems. I wouldn''t worry about this too
much.
> This computer is Ubuntu, but I''ve updated to the latest kernel and
> btrfs-tools I could find, and the problems remain.
> 
> Below is what showed up in dmesg during the run of scrub. Most of the
> time the error is "btrfs: block rsv returned -28", but the
aborted
> transaction and auto-ro is always there.
   Just for reference, -28 is -ENOSPC.
> Anything I can do to help identify a bug here? Clearly one problem is
> that the filesystem checking tools can''t find anything wrong, much
> less fix the filesystem.
> [12208.367199] Pid: 1955, comm: btrfs-transacti Not tainted
> 3.5.7-03050702-generic #201212170935  ^^^^^ There''s significant ENOSPC fixes since this point. A new
kernel (3.7) will probably help some of the way -- see below for some
of the details.

   The other thing I''d like to check is what balance command
you''re
using. With your current problems, I''d suggest the following:

# btrfs balance start -dusage=5 /mountpoint

   This will attempt to move the data in every data chunk which is
less than 5% full. With a 3.7 kernel (but not 3.5 IIRC(*)), that
should free up most of the 60 GiB of allocated but unused data space
you''ve got.

   Hugo.

(*) Kernels before 3.5, possibly 3.6 -- I can''t recall exactly when it
got fixed -- had a problem where they''d massively overallocate chunks.
With those earlier kernels, you could end up with a situation like
yours which wouldn''t be helped by the balance operation.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Two things came out of Berkeley in the 1960s: LSD and Unix. ---   
                       This is not a coincidence.

Mitch Harder

2013-Jan-11 20:19 UTC

head link

Re: Errors not found by btrfsck or scrub

On Fri, Jan 11, 2013 at 12:13 PM, Chris Carlin <chrisrcarlin@gmail.com>
wrote:> I have a week-old filesystem that is reported clean by btrfsck and
> scrub, but that fails under operations ranging from du to sync and
> umount (but no failures if mounted readonly).
>
> My problem sounds similar to a few other reports (e.g. TM''s in
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/22014 ) that
> seem to hint at problems with full metadata. My df shows:
>
I know this advice will run counter to what everyone else is saying,
but I''ve had some luck booting with an older kernel (such as 3.4 or
3.5) just long enough to get some more Metadata allocated.

I would also caution you to back up your data.  I''ve had a similar
issue, and that file system soon showed additional corruptions.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Carlin

2013-Jan-11 21:57 UTC

head link

Re: Errors not found by btrfsck or scrub

Thanks for the response, Hugo!

This hard drive is not production, so I can afford to tinker with it
if it helps you guys track down anything interesting. Of course, I''d
prefer to restore it rather than wipe it...

The initial balance operation was run on kernel 3.2.0 with Ubuntu''s
older version of btrfs-tools without any parameters/filters IIRC. Like
I said, the balance (falsely?) reported success. It was back in 3.2.0
where the filesystem started misbehaving, Mitch, so I don''t think
older versions are getting me the metadata space, unless there''s an
operation I''m missing.

Now I upgraded to kernel 3.7.2 (I''ll never get used to the new
numbering scheme), but as with kernel 3.5.7, balance hangs without
backgrounding, and there''s an error in dmesg. I ran balance with the
suggested -dusage=5.

The dmesg spiel is below.

~Chris

[  176.204750] btrfs: block rsv returned -28
[  176.204762] ------------[ cut here ]------------
[  176.204861] WARNING: at
/home/apw/COD/linux/fs/btrfs/extent-tree.c:6297
use_block_rsv+0x193/0x1a0 [btrfs]()
[  176.204865] Hardware name: KT600-8237
[  176.204868] Modules linked in: matrox_w1 wire i2c_viapro shpchp
serio_raw mac_hid w83627hf hwmon_vid lp parport hid_generic btrfs
usbhid zlib_deflate libcrc32c hid r8169 sata_via sata_sil
[  176.204896] Pid: 1703, comm: btrfs Not tainted 3.7.2-030702-generic
#201301111424
[  176.204900] Call Trace:
[  176.204916]  [<c104b0e2>] warn_slowpath_common+0x72/0xa0
[  176.204941]  [<e097ebe3>] ? use_block_rsv+0x193/0x1a0 [btrfs]
[  176.204964]  [<e097ebe3>] ? use_block_rsv+0x193/0x1a0 [btrfs]
[  176.204970]  [<c104b132>] warn_slowpath_null+0x22/0x30
[  176.204994]  [<e097ebe3>] use_block_rsv+0x193/0x1a0 [btrfs]
[  176.205020]  [<e098227b>] btrfs_alloc_free_block+0x3b/0x2b0 [btrfs]
[  176.205027]  [<c1116fee>] ? __set_page_dirty_nobuffers+0x1e/0x20
[  176.205053]  [<e098b164>] ? btree_set_page_dirty+0x34/0x40 [btrfs]
[  176.205058]  [<c1116295>] ? set_page_dirty+0x55/0x60
[  176.205095]  [<e09b3a61>] ? set_extent_buffer_dirty+0x71/0xb0 [btrfs]
[  176.205107]  [<c1607d2d>] ? _raw_spin_lock+0xd/0x10
[  176.205136]  [<e09b411c>] ? read_extent_buffer+0x9c/0x100 [btrfs]
[  176.205159]  [<e096dfc4>] __btrfs_cow_block+0x144/0x590 [btrfs]
[  176.205184]  [<e098caa4>] ?
btree_read_extent_buffer_pages.constprop.120+0xf4/0x140 [btrfs]
[  176.205207]  [<e096e54e>] btrfs_cow_block+0xde/0x230 [btrfs]
[  176.205230]  [<e097121f>] push_leaf_right+0xef/0x160 [btrfs]
[  176.205252]  [<e09718a7>] split_leaf+0x547/0x680 [btrfs]
[  176.205276]  [<e09704ff>] ? btrfs_leaf_free_space+0x3f/0x90 [btrfs]
[  176.205298]  [<e096fa65>] ? bin_search+0x75/0x90 [btrfs]
[  176.205321]  [<e097226e>] btrfs_search_slot+0x60e/0x660 [btrfs]
[  176.205344]  [<e097348d>] btrfs_insert_empty_items+0x5d/0xb0 [btrfs]
[  176.205367]  [<e097af9c>] alloc_reserved_tree_block+0x6c/0x270 [btrfs]
[  176.205391]  [<e097bae7>] run_delayed_tree_ref+0xd7/0x220 [btrfs]
[  176.205415]  [<e0980e64>] run_one_delayed_ref+0x134/0x150 [btrfs]
[  176.205439]  [<e0980f87>] run_clustered_refs+0x107/0x330 [btrfs]
[  176.205463]  [<e0983c3e>] btrfs_run_delayed_refs+0xae/0x2b0 [btrfs]
[  176.205490]  [<e09933ca>] btrfs_commit_transaction+0x40a/0x950 [btrfs]
[  176.205519]  [<e09b3a61>] ? set_extent_buffer_dirty+0x71/0xb0 [btrfs]
[  176.205527]  [<c106cd70>] ? add_wait_queue+0x50/0x50
[  176.205550]  [<e096d440>] ? btrfs_free_path+0x20/0x30 [btrfs]
[  176.205579]  [<e09b7b3b>] insert_balance_item+0x49b/0x590 [btrfs]
[  176.205609]  [<e09bc3a5>] btrfs_balance+0x335/0x5d0 [btrfs]
[  176.205619]  [<c115f8e0>] ? __sb_start_write+0x50/0xf0
[  176.205625]  [<c114da51>] ? kmem_cache_alloc_trace+0x111/0x140
[  176.205657]  [<e09c49bd>] btrfs_ioctl_balance+0xed/0x380 [btrfs]
[  176.205686]  [<e09c6591>] btrfs_ioctl+0x561/0x890 [btrfs]
[  176.205695]  [<c160b73f>] ? __do_page_fault+0x25f/0x4d0
[  176.205724]  [<e09c6030>] ? update_ioctl_balance_args+0x2c0/0x2c0
[btrfs]
[  176.205734]  [<c116d76f>] do_vfs_ioctl+0x7f/0x2f0
[  176.205740]  [<c116da50>] sys_ioctl+0x70/0x80
[  176.205749]  [<c160f40d>] sysenter_do_call+0x12/0x28
[  176.205753] ---[ end trace 7da78ca4ff782519 ]---
[  176.205783] btrfs: run_one_delayed_ref returned -28
[  176.205788] ------------[ cut here ]------------
[  176.205811] WARNING: at /home/apw/COD/linux/fs/btrfs/super.c:246
__btrfs_abort_transaction+0xde/0x100 [btrfs]()
[  176.205814] Hardware name: KT600-8237
[  176.205817] btrfs: Transaction aborted
[  176.205819] Modules linked in: matrox_w1 wire i2c_viapro shpchp
serio_raw mac_hid w83627hf hwmon_vid lp parport hid_generic btrfs
usbhid zlib_deflate libcrc32c hid r8169 sata_via sata_sil
[  176.205841] Pid: 1703, comm: btrfs Tainted: G        W
3.7.2-030702-generic #201301111424
[  176.205843] Call Trace:
[  176.205849]  [<c104b0e2>] warn_slowpath_common+0x72/0xa0
[  176.205872]  [<e096a1be>] ? __btrfs_abort_transaction+0xde/0x100
[btrfs]
[  176.205895]  [<e096a1be>] ? __btrfs_abort_transaction+0xde/0x100
[btrfs]
[  176.205900]  [<c104b1b3>] warn_slowpath_fmt+0x33/0x40
[  176.205923]  [<e096a1be>] __btrfs_abort_transaction+0xde/0x100 [btrfs]
[  176.205948]  [<e0983dbe>] btrfs_run_delayed_refs+0x22e/0x2b0 [btrfs]
[  176.205975]  [<e09933ca>] btrfs_commit_transaction+0x40a/0x950 [btrfs]
[  176.206003]  [<e09b3a61>] ? set_extent_buffer_dirty+0x71/0xb0 [btrfs]
[  176.206009]  [<c106cd70>] ? add_wait_queue+0x50/0x50
[  176.206032]  [<e096d440>] ? btrfs_free_path+0x20/0x30 [btrfs]
[  176.206060]  [<e09b7b3b>] insert_balance_item+0x49b/0x590 [btrfs]
[  176.206091]  [<e09bc3a5>] btrfs_balance+0x335/0x5d0 [btrfs]
[  176.206096]  [<c115f8e0>] ? __sb_start_write+0x50/0xf0
[  176.206102]  [<c114da51>] ? kmem_cache_alloc_trace+0x111/0x140
[  176.206131]  [<e09c49bd>] btrfs_ioctl_balance+0xed/0x380 [btrfs]
[  176.206160]  [<e09c6591>] btrfs_ioctl+0x561/0x890 [btrfs]
[  176.206166]  [<c160b73f>] ? __do_page_fault+0x25f/0x4d0
[  176.206196]  [<e09c6030>] ? update_ioctl_balance_args+0x2c0/0x2c0
[btrfs]
[  176.206202]  [<c116d76f>] do_vfs_ioctl+0x7f/0x2f0
[  176.206208]  [<c116da50>] sys_ioctl+0x70/0x80
[  176.206214]  [<c160f40d>] sysenter_do_call+0x12/0x28
[  176.206218] ---[ end trace 7da78ca4ff78251a ]---
[  176.206224] BTRFS error (device sdc1) in
btrfs_run_delayed_refs:2514: error 28
[  176.206235] btrfs is forced readonly
[  176.206241] BTRFS warning (device sdc1): Skipping commit of aborted
transaction.
[  176.206245] BTRFS error (device sdc1) in cleanup_transaction:1378: error 28
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Jan-11 22:27 UTC

head link

Re: Errors not found by btrfsck or scrub

On Jan 11, 2013, at 2:57 PM, Chris Carlin <chrisrcarlin@gmail.com> wrote:
> Thanks for the response, Hugo!
> 
> This hard drive is not production, so I can afford to tinker with it
> if it helps you guys track down anything interesting. Of course,
I''d
> prefer to restore it rather than wipe it…
What is the result for?

btrfs fi show

I see from the first post that it must be two disks/partitions, since
it''s data raid0, and metadata partially raid1. And also it might be
vaguely useful to see a conventional df or df -h.

Based on some experiences I''ve had, and also seen on the list recently,
you might be able to back out of this situation by adding another device to the
volume. It almost doesn''t matter how big it is. It could be a small
partition on another disk, or even a USB stick. I can''t tell you how
much space. It might only need a few MB, but I''d give it what you can.
And then see if you can redo the balance. But it sounds to me like the file
system is very close to full, at least it can''t allocate more space for
metadata it seems.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Carlin

2013-Jan-12 04:43 UTC

head link

Re: Errors not found by btrfsck or scrub

> Based on some experiences I''ve had, and also seen on the list
recently, you might be able to back out of this situation by adding another
device to the volume. It almost doesn''t matter how big it is. It could
be a small partition on another disk, or even a USB stick. I can''t tell
you how much space. It might only need a few MB, but I''d give it what
you can. And then see if you can redo the balance. But it sounds to me like the
file system is very close to full, at least it can''t allocate more
space for metadata it seems.
Aha! Worked fine.

I added a temporary 10GB device, and everything instantly started
working. After I ran another balance command with -dusage=5, I deleted
the new device and everything''s fine with the original two devices.
The temporary device was barely touched by the filesystem, if it was
touched at all.

A weird thing is "btrfs fi show" showed tens of gigs of unallocated
space the whole time.

Anyway, so for anyone who might have a similar problem in the future,
giving btrfs a little temporary scratch space did the trick.

~Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Jan-12 05:01 UTC

head link

Re: Errors not found by btrfsck or scrub

On Jan 11, 2013, at 9:43 PM, Chris Carlin <chrisrcarlin@gmail.com> wrote:
>> Based on some experiences I''ve had, and also seen on the list
recently, you might be able to back out of this situation by adding another
device to the volume. It almost doesn''t matter how big it is. It could
be a small partition on another disk, or even a USB stick. I can''t tell
you how much space. It might only need a few MB, but I''d give it what
you can. And then see if you can redo the balance. But it sounds to me like the
file system is very close to full, at least it can''t allocate more
space for metadata it seems.
> 
> Aha! Worked fine.
> 
> I added a temporary 10GB device, and everything instantly started
> working. After I ran another balance command with -dusage=5, I deleted
> the new device and everything''s fine with the original two
devices.
> The temporary device was barely touched by the filesystem, if it was
> touched at all.
> 
> A weird thing is "btrfs fi show" showed tens of gigs of
unallocated
> space the whole time.
> 
> Anyway, so for anyone who might have a similar problem in the future,
> giving btrfs a little temporary scratch space did the trick.
Well you never did show us your before btrfs fi show, but going back to your fi
df from the original email:

Metadata, RAID1: total=512.00MB, used=511.60MB
Metadata, DUP: total=1.00GB, used=1022.39MB

It''s out of space for metadata, is the problem I see above.
There''s not even 2 MB of space left.

Yes in this portion for data:

Data, RAID0: total=776.32GB, used=717.56GB
Data: total=81.00GB, used=29.44GB

This implies  you have lots of space, but actually it''s the first value
that matters in these cases because data and metadata allocations are separate.
This space you think you have free is allocated for data. I don''t know
the details of whether, when, or how Btrfs can deallocate a data chunk and
reallocate it for metadata but clearly it was unable or unwilling to do so in
this case. So when you added the temporary space, it was immediately able to
allocate some additional for metadata chunk and unwind itself.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jan 2013 - Errors not found by btrfsck or scrub

Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub

Re: Errors not found by btrfsck or scrub