thr3ads.net - Btrfs devel - BUG at fs/btrfs/print-tree when trying to mount after a crash [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Michael Zugelder

2013-Jun-17 19:10 UTC

BUG at fs/btrfs/print-tree when trying to mount after a crash

Hi,

my laptop with a btrfs on dm-crypt on SSD freezed today shortly after
resuming from suspend (it doesn''t normally do that). I was running a
self compiled 3.9.6 at this point. There should be around 20 of 114 GiB
free on the file system and it was probably created with 16K leaf size.

After rebooting, mounting the rootfs didn''t work anymore. I made a copy
of the disk and am now trying to fix it using my desktop. Trying to
mount it with -o recovery on 3.10-rc6 triggers the following bug:
> [  170.817246] BTRFS info (device dm-5): leaf 24297472 total ptrs 160 free
space 3478
> [  170.817250]  item 0 key (19642265600 a8 53248) itemoff 16230 itemsize 53
> [  170.817251]          extent refs 18177 gen 189975 flags 34305
> [  170.817253]          extent data backref root 295 objectid 1647618
offset 1573046 count 162
> [  170.817254]  item 1 key (19642318848 a8 49152) itemoff 16177 itemsize 53
> [  170.817255]          extent refs 1 gen 150295 flags 1
> [  170.817257]          extent data backref root 259 objectid 1647675
offset 148434071453696 count 1
> [  170.817258]  item 2 key (19642368000 a8 53248) itemoff 16124 itemsize 53
> [  170.817259]          extent refs 1358954497 gen 335694615 flags
1124073473
> [  170.817260]          extent data backref root 1835267 objectid 1647675
offset 1835008 count 1
> [  170.817261]  item 3 key (19642421248 a8 45056) itemoff 16071 itemsize 53
> [  170.817262]          extent refs 1 gen 150295 flags 1
> [  170.817269] ------------[ cut here ]------------
> [  170.817292] kernel BUG at fs/btrfs/print-tree.c:136!
> [  170.817304] invalid opcode: 0000 [#1] PREEMPT SMP 
> [  170.817317] Modules linked in: mxm_wmi wmi i915 cfbfillrect cfbimgblt
cfbcopyarea intel_agp intel_gtt drm_kms_helper
> [  170.817347] CPU: 1 PID: 3706 Comm: mount Tainted: G        W   
3.10.0-rc6 #13
> [  170.817364] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./Z77 Extreme4, BIOS P2.80 01/17/2013
> [  170.817384] task: ffff88041980b9f0 ti: ffff8803d0dc4000 task.ti:
ffff8803d0dc4000
> [  170.817400] RIP: 0010:[<ffffffff812e3156>] 
[<ffffffff812e3156>] btrfs_print_leaf+0x806/0x910
> [  170.817421] RSP: 0018:ffff8803d0dc5768  EFLAGS: 00010a87
> [  170.817433] RAX: 6900000000000103 RBX: 00000000000000a0 RCX:
000000000000005a
> [  170.817448] RDX: 0000000000003000 RSI: 0000000000003f45 RDI:
ffff8803ca8c03f0
> [  170.817463] RBP: ffff8803d0dc57d8 R08: 0000000000004000 R09:
ffff8803d0dc5710
> [  170.817478] R10: 0000000000000000 R11: 0000000000000003 R12:
0000000000000003
> [  170.817493] R13: 0000000000003f44 R14: 000000000000005a R15:
ffff8803ca8c03f0
> [  170.817508] FS:  00007fcb17a61840(0000) GS:ffff88042f240000(0000)
knlGS:0000000000000000
> [  170.817526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  170.817538] CR2: 0000000002a42d00 CR3: 00000003d0cf6000 CR4:
00000000001407e0
> [  170.817553] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [  170.817568] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [  170.817583] Stack:
> [  170.817588]  ffff880400000035 0000000000000035 000000000000b05a
0000000000000001
> [  170.817606]  00000035174c0d88 0000000000003f61 0000000000000020
a80000000492c790
> [  170.817623]  000000000000b000 ffff8804174c0d80 0000000492cb8000
0000000000000000
> [  170.817641] Call Trace:
> [  170.817649]  [<ffffffff812d9973>] __btrfs_free_extent+0x613/0xa20
> [  170.817663]  [<ffffffff8133501c>] ?
btrfs_merge_delayed_refs+0x1fc/0x3c0
> [  170.817679]  [<ffffffff812dd25c>] run_clustered_refs+0x37c/0xd60
> [  170.817694]  [<ffffffff812e13e0>]
btrfs_run_delayed_refs+0xd0/0x540
> [  170.817709]  [<ffffffff812ce157>] ? comp_keys+0x27/0x30
> [  170.817722]  [<ffffffff812f12c2>]
btrfs_commit_transaction+0x82/0xac0
> [  170.817736]  [<ffffffff812d0f94>] ? btrfs_search_slot+0x504/0x920
> [  170.817750]  [<ffffffff812cbb42>] ? btrfs_release_path+0x22/0xb0
> [  170.817764]  [<ffffffff810adce0>] ? finish_wait+0x80/0x80
> [  170.817777]  [<ffffffff8132c543>]
btrfs_recover_log_trees+0x3b3/0x480
> [  170.817791]  [<ffffffff81329a50>] ? add_inode_ref+0xa10/0xa10
> [  170.817804]  [<ffffffff812ee8b9>] open_ctree+0x1839/0x1f60
> [  170.817819]  [<ffffffff8170ee85>] ? ras_help+0x535/0xcd0
> [  170.817831]  [<ffffffff812c8483>] btrfs_mount+0x673/0x8f0
> [  170.817844]  [<ffffffff8111a0e6>] ? pcpu_next_pop+0x46/0x60
> [  170.817858]  [<ffffffff81152f2e>] mount_fs+0x3e/0x1b0
> [  170.817870]  [<ffffffff8116bdbf>] vfs_kern_mount+0x6f/0x110
> [  170.817883]  [<ffffffff8116e2b9>] do_mount+0x259/0xa20
> [  170.817897]  [<ffffffff8110a2b2>] ? __get_free_pages+0x12/0x50
> [  170.817911]  [<ffffffff8116dee1>] ? copy_mount_options+0x31/0x170
> [  170.817925]  [<ffffffff8116eb09>] SyS_mount+0x89/0xd0
> [  170.817938]  [<ffffffff817f0ad6>] system_call_fastpath+0x1a/0x1f
> [  170.817951] Code: ba 01 00 00 00 4c 89 ee 4c 89 ff 44 0f b6 f0 88 45 a0
e8 fe f7 ff ff 0f b6 4d a0 80 f9 b2 0f 84 d4 00 00 00 77 7a 80 f9 b0 74 33
<0f> 0b 44 89 e6 4c 89 ff e8 2c 59 50 00 4c 89 ff 89 c6 48 83 c6
> [  170.818021] RIP  [<ffffffff812e3156>] btrfs_print_leaf+0x806/0x910
> [  170.818036]  RSP <ffff8803d0dc5768>
> [  170.822960] ---[ end trace 224779f5de794488 ]---
> [  170.822962] note: mount[3706] exited with preempt_count 2
I presume the "btrfs_recover_log_trees" line is a good sign for my
data?
I have daily backups of 99% of the data, but would have to reinstall
some distro.

Btrfsck from git master (650e656a) spits out the following, before crashing:
> corrupt extent record: key 19642421248 168 45056
> corrupt extent record: key 19642904576 168 40960
> corrupt extent record: key 19643248640 168 49152
> corrupt extent record: key 19644252160 168 49152
> corrupt extent record: key 19644645376 168 40960
> corrupt extent record: key 19645878272 168 4096
> corrupt extent record: key 19646754816 168 524288
> ref mismatch on [19642265600 53248] extent item 18177, found 1
> Backref 19642265600 root 259 owner 1647675 offset 1572864 num_refs 0 not
found in extent tree
> Incorrect local backref count on 19642265600 root 259 owner 1647675 offset
1572864 found 1 wanted 0 back 0x7c4b250
> Incorrect local backref count on 19642265600 root 295 owner 1647618 offset
1573046 found 0 wanted 162 back 0x3846100
> backpointer mismatch on [19642265600 53248]
[ ... snip, many similar errors ... ]> Errors found in extent allocation tree
> checking free space cache
> btrfs: unable to add free space :-17
> btrfsck: free-space-cache.c:813: btrfs_add_free_space: Assertion `!(ret ==
-17)'' failed.
I also took a picture when I first saw the problems directly after the
reboot and it showed an additional BUG. It is reproducible with the self
compiled kernel, but not with the Fedora 18 3.9.5 kernel. Maybe because
Fedora doesn''t compile with CONFIG_PREEMPT?
> BUG: scheduling while atomic: mount/354/0x10000003
> Modules linked in: nouveau ttm mxm_wmi wmi
> Pid: 354, comm: mount Tainted: G      D W    3.9.6 #11
> Call Trace:
>  __schedule_bug
>  __schedule
>  __cond_resched
>  _cond_resched
>  unmap_single_vma
>  unmap_vmas
>  exit_mmap
>  ? lock_hrtimer_base.isra.31
>  ? _raw_spin_unlock_irqrestore
>  ? _raw_spin_unlock_irq
>  mmput
>  do_exit
>  ? kmsg_dump
>  oops_end
>  die
>  do_trap
>  ? atomic_notifier_call_chain
>  do_invalid_op
>  btrfs_print_leaf[...]


Any suggestions? Never had a problem before running btrfs on that
machine and SSD for about 2 years now.


Thanks
Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan

2013-Jun-18 06:04 UTC

head link

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash

Michael Zugelder posted on Mon, 17 Jun 2013 21:10:27 +0200 as excerpted:
> Hi,
> 
> my laptop with a btrfs on dm-crypt on SSD freezed today shortly after
> resuming from suspend (it doesn''t normally do that). I was running
a
> self compiled 3.9.6 at this point. There should be around 20 of 114 GiB
> free on the file system and it was probably created with 16K leaf size.
> 
> After rebooting, mounting the rootfs didn''t work anymore. I made a
copy
> of the disk and am now trying to fix it using my desktop. Trying to
> mount it with -o recovery on 3.10-rc6 triggers the following bug: [snipped]
> Any suggestions? Never had a problem before running btrfs on that
> machine and SSD for about 2 years now.
The technical debugging''s for others, but two suggestions as a btrfs
user/
tester:

1) I had an similar issue some time back that turned out to be a 
corrupted space-cache.  Try mounting with the "nospace_cache" option. 
If
that works that''s it; mount with the "clear_cache" option to
clear the
bad cache and perhaps once again with space_cache to turn it back on. 
(Space-cache is one of the few options that''s persistent, but
I''m not
sure if no-cache is equally persistent, making it a toggle, or if once 
it''s on there''s no way to turn it off permanently.)

The clear_cache option will trigger a cache rebuild, so will take some 
time on slower devices (you said SSD but didn''t say whether it was a
slow
one or a fast one).  See the mount-options page at the wiki for more.

https://btrfs.wiki.kernel.org/index.php/Mount_options#Space_cache_control

2) Apparently some corruption bugs in 3.9 were fixed for 3.10.  It''s 
worth trying say the latest 3.10-rc kernel, to see if can handle it.

As the wiki mentions in several places and as is frequently repeated 
here, btrfs is still under heavy development and the development kernels 
often have fixes missing in even latest stable.  So unless there''s a 
known regression in the current development kernel that you''re 
purposefully avoiding, really, relatively speaking, in terms of btrfs 
development, latest mainline kernel stable is in practice already a bit 
dated, and btrfs testers are encouraged to run the development kernels 
from rc2 or rc3 at least.  (I can understand not wanting to run them 
during the commit window, or until rc2/3, as I often hold off during the 
commit window here, too.  Some even run btrfs-next, before it hits 
mainline, but I''ve chosen to stick with mainline here.  That''s
just
simpler all around for me, and the rcs are still current /enough/.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael Zugelder

2013-Jun-18 12:11 UTC

head link

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash

Thanks for the reply.

On Tue, 2013-06-18 at 06:04 +0000, Duncan wrote:
[...]> 1) I had an similar issue some time back that turned out to be a 
> corrupted space-cache.  Try mounting with the "nospace_cache"
option.  If
> that works that''s it; mount with the "clear_cache"
option to clear the
> bad cache and perhaps once again with space_cache to turn it back on. 
> (Space-cache is one of the few options that''s persistent, but
I''m not
> sure if no-cache is equally persistent, making it a toggle, or if once 
> it''s on there''s no way to turn it off permanently.)
> 
> The clear_cache option will trigger a cache rebuild, so will take some 
> time on slower devices (you said SSD but didn''t say whether it was
a slow
> one or a fast one).  See the mount-options page at the wiki for more.
> 
> https://btrfs.wiki.kernel.org/index.php/Mount_options#Space_cache_control
Tried it out, but still BUGs the same way. I''m using a full-disk backup
on a regular HDD, so I can try everything without worrying if it makes
the situation worse.
> 2) Apparently some corruption bugs in 3.9 were fixed for 3.10. 
It''s
> worth trying say the latest 3.10-rc kernel, to see if can handle it.
Unfortunately, I already tried using 3.10-rc6 in my first mail.
> As the wiki mentions in several places and as is frequently repeated 
> here, btrfs is still under heavy development and the development kernels 
> often have fixes missing in even latest stable.  So unless there''s
a
> known regression in the current development kernel that you''re 
> purposefully avoiding, really, relatively speaking, in terms of btrfs 
> development, latest mainline kernel stable is in practice already a bit 
> dated, and btrfs testers are encouraged to run the development kernels 
> from rc2 or rc3 at least.  (I can understand not wanting to run them 
> during the commit window, or until rc2/3, as I often hold off during the 
> commit window here, too.  Some even run btrfs-next, before it hits 
> mainline, but I''ve chosen to stick with mainline here. 
That''s just
> simpler all around for me, and the rcs are still current /enough/.)
I''m often upgrading the kernel on my desktop, since I have to
shutdown/boot it anyways because with xen it crashes on resume. But I
don''t reboot the notebook much. Probably only every few months when a
new a major version is released.

Anyway, I tried using btrfsck from Josef Bacik''s tree (commit f392a28d,
git://github.com/josefbacik/btrfs-progs.git) and it doesn''t crash.
Output is this:
> Checking filesystem on /dev/mapper/cryptbtrfs
> UUID: b3a88070-748c-4f19-9c7c-c78e8232797c
> checking extents
> corrupt extent record: key 19642421248 168 45056
> corrupt extent record: key 19642904576 168 40960
> corrupt extent record: key 19643248640 168 49152
> corrupt extent record: key 19644252160 168 49152
> corrupt extent record: key 19644645376 168 40960
> corrupt extent record: key 19645878272 168 4096
> corrupt extent record: key 19646754816 168 524288
> ref mismatch on [19642265600 53248] extent item 18177, found 1
> Backref 19642265600 root 259 owner 1647675 offset 1572864 num_refs 0 not
found in extent tree
> Incorrect local backref count on 19642265600 root 259 owner 1647675 offset
1572864 found 1 wanted 0 back 0x95f91a0
> Incorrect local backref count on 19642265600 root 295 owner 1647618 offset
1573046 found 0 wanted 162 back 0x4efb2b0
> Backref disk bytenr does not match extent record, bytenr=19642265600, ref
bytenr=82290641
> backpointer mismatch on [19642265600 53248]
> Backref 19642318848 root 259 owner 1647675 offset 1703936 num_refs 0 not
found in extent tree
> Incorrect local backref count on 19642318848 root 259 owner 1647675 offset
1703936 found 1 wanted 0 back 0x95f92c0
> Incorrect local backref count on 19642318848 root 259 owner 1647675 offset
148434071453696 found 0 wanted 1 back 0x4efb310
> Backref disk bytenr does not match extent record, bytenr=19642318848, ref
bytenr=1
> backpointer mismatch on [19642318848 49152]
[ ... ]> Errors found in extent allocation tree
> checking free space cache
> checking fs roots
> root 1597 inode 553154 errors 0
>         unresolved ref dir 473796 index 153 namelen 17 name
browser_tests.log filetype 1 error 4
>         unresolved ref dir 473796 index 17179869337 namelen 17 name
brwser_te6ts.log filetype 0 error 3
> found 62413098968 bytes used err is 1
> total csum bytes: 71717016
> total tree bytes: 2047426560
> total fs tree bytes: 1812643840
> total extent tree bytes: 129286144
> btree space waste bytes: 427714485
> file data blocks allocated: 186676146176
>  referenced 125879046144
> Btrfs v0.20-rc1
Any Ideas?


Thanks
Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael Zugelder

2013-Jun-19 20:57 UTC

head link

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash

Hi,

here''s an update on my situation.

On Tue, 2013-06-18 at 14:11 +0200, Michael Zugelder
wrote:> Anyway, I tried using btrfsck from Josef Bacik''s tree (commit
f392a28d,
> git://github.com/josefbacik/btrfs-progs.git) and it doesn''t crash.
> Output is this:
> 
> > Checking filesystem on /dev/mapper/cryptbtrfs
> > UUID: b3a88070-748c-4f19-9c7c-c78e8232797c
> > checking extents
> > corrupt extent record: key 19642421248 168 45056
> > corrupt extent record: key 19642904576 168 40960
> > corrupt extent record: key 19643248640 168 49152
> > corrupt extent record: key 19644252160 168 49152
> > corrupt extent record: key 19644645376 168 40960
> > corrupt extent record: key 19645878272 168 4096
> > corrupt extent record: key 19646754816 168 524288
> > ref mismatch on [19642265600 53248] extent item 18177, found 1
> > Backref 19642265600 root 259 owner 1647675 offset 1572864 num_refs 0
not found in extent tree
> > Incorrect local backref count on 19642265600 root 259 owner 1647675
offset 1572864 found 1 wanted 0 back 0x95f91a0
> > Incorrect local backref count on 19642265600 root 295 owner 1647618
offset 1573046 found 0 wanted 162 back 0x4efb2b0
> > Backref disk bytenr does not match extent record, bytenr=19642265600,
ref bytenr=82290641
> > backpointer mismatch on [19642265600 53248]
> > Backref 19642318848 root 259 owner 1647675 offset 1703936 num_refs 0
not found in extent tree
> > Incorrect local backref count on 19642318848 root 259 owner 1647675
offset 1703936 found 1 wanted 0 back 0x95f92c0
> > Incorrect local backref count on 19642318848 root 259 owner 1647675
offset 148434071453696 found 0 wanted 1 back 0x4efb310
> > Backref disk bytenr does not match extent record, bytenr=19642318848,
ref bytenr=1
> > backpointer mismatch on [19642318848 49152]
> [ ... ]
> > Errors found in extent allocation tree
> > checking free space cache
> > checking fs roots
> > root 1597 inode 553154 errors 0
> >         unresolved ref dir 473796 index 153 namelen 17 name
browser_tests.log filetype 1 error 4
> >         unresolved ref dir 473796 index 17179869337 namelen 17 name
brwser_te6ts.log filetype 0 error 3
> > found 62413098968 bytes used err is 1
> > total csum bytes: 71717016
> > total tree bytes: 2047426560
> > total fs tree bytes: 1812643840
> > total extent tree bytes: 129286144
> > btree space waste bytes: 427714485
> > file data blocks allocated: 186676146176
> >  referenced 125879046144
> > Btrfs v0.20-rc1
I recently discovered that the git btrfsck has a --repair option. It
seems like it was able to fix most of the errors. Here''s what I did to
get rid of the rest (output from btrfsck):
> unresolved ref dir 473796 index 153 namelen 17 name browser_tests.log
filetype 1 error 4
> unresolved ref dir 473796 index 17179869337 namelen 17 name
brwser_te6ts.log filetype 0 error 3
Could not delete the file after mounting, so I just deleted the entire
subvolume and it is gone now.
> root 260  inode 12345 errors 42
Used "find -num 12345" to see which files referenced it. Since it was
just a sqlite journal file, I deleted it. But it was also referenced by
the snapshots of the last 24 hours, so I just deleted them, too.
> cache and super generation don''t match, space cache will be
invalidated
Mounted a few times with -o clear_cache, nospace_cache and then
space_cache. Dmesg shows "btrfs: disk space caching is enabled" and no
errors, so I hope it works again. Never noticed any long mount/unmount
times or heavy cpu/disk activity, though.

Then I ran btrfs scrub over it, which found 0 errors, but there were
still some corruption issues from the btrfsck repair. I used rpm -Va and
reinstalled a few packages whose files were missing and added myself to
the relevant groups again.

The system seems to work fine now, but I''ll probably keep some easier
restorable backups from now on. If you want any details to be able to
fix some of the BUGs, assertion errors and/or crashes I encountered,
I''m
glad to help, still have the original corrupted image.


Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2013 - BUG at fs/btrfs/print-tree when trying to mount after a crash

BUG at fs/btrfs/print-tree when trying to mount after a crash

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash

Re: BUG at fs/btrfs/print-tree when trying to mount after a crash