David Sterba
2012-Jan-05 16:49 UTC
Crash in io_ctl_drop_pages after mount with csum errors
I mounted a multi-folume fs created not-so-long ago in a 3.1 based
kernel and mounted with v3.2-rc7-83-g115e8e7 , it crashed immediately.
It''s quite possible that the disk is to blame, it''s an old
160G
SP1614C, but syslog does not contain any error messages. I''m not sure
whether the fs was cleanly unmounted, seems not, but anyway I do not
expect a crash.
Label: none uuid: 5f06f9eb-9736-49f7-91a2-2f45522512ef
Total devices 4 FS bytes used 1.38GB
devid 4 size 34.00GB used 34.00GB path /dev/sdg8
devid 3 size 34.00GB used 34.00GB path /dev/sdg7
devid 2 size 34.00GB used 34.00GB path /dev/sdg6
devid 1 size 34.00GB used 34.00GB path /dev/sdg5
mount options: compress-force=lzo,space_cache,autodefrag,inode_cache
[ 1461.732855] btrfs: force lzo compression
[ 1461.732876] btrfs: enabling auto defrag
[ 1461.732893] btrfs: enabling inode map caching
[ 1461.732907] btrfs: disk space caching is enabled
[ 1499.796181] btrfs: csum mismatch on free space cache
[ 1499.796266] btrfs: failed to load free space cache for block group 29360128
[ 1499.888699] btrfs csum failed ino 18446744073709551604 off 65536 csum
2566472073 private 1925235876
[ 1499.888826] btrfs csum failed ino 18446744073709551604 off 327680 csum
2566472073 private 1925235876
[ 1499.906229] btrfs csum failed ino 18446744073709551604 off 0 csum 1695430581
private 1170642078
[ 1499.906345] btrfs csum failed ino 18446744073709551604 off 262144 csum
2566472073 private 1925235876
[ 1499.906446] btrfs csum failed ino 18446744073709551604 off 524288 csum
2566472073 private 1925235876
[ 1499.924469] btrfs csum failed ino 18446744073709551604 off 196608 csum
2566472073 private 1925235876
[ 1499.924574] btrfs csum failed ino 18446744073709551604 off 458752 csum
2566472073 private 1925235876
[ 1499.946076] btrfs csum failed ino 18446744073709551604 off 131072 csum
2566472073 private 1925235876
[ 1499.946217] btrfs csum failed ino 18446744073709551604 off 393216 csum
2566472073 private 1925235876
[ 1499.946318] btrfs csum failed ino 18446744073709551604 off 0 csum 1695430581
private 1170642078
[ 1499.946362] btrfs: error reading free space cache
[ 1499.946409] BUG: unable to handle kernel NULL pointer dereference at
0000000000000001
[ 1499.946437] IP: [<ffffffffa0456dd7>] io_ctl_drop_pages+0x37/0x70
[btrfs]
[ 1499.946515] PGD 125ce4067 PUD 126941067 PMD 0
[ 1499.946539] Oops: 0002 [#1] PREEMPT SMP
[ 1499.946560] CPU 0
[ 1499.946569] Modules linked in: btrfs zlib_deflate aoe nfs lockd fscache
auth_rpcgss nfs_acl sunrpc af_packet cpufreq_conservative cpufreq_userspace
cpufreq_powersave powernow_k8 mperf snd_hda_codec_analog snd_hda_intel snd
_hda_codec sg sp5100_tco snd_hwdep snd_pcm amd64_edac_mod snd_timer pcspkr
edac_core snd edac_mce_amd firewire_ohci firewire_core crc_itu_t i2c_piix4
k8temp asus_atk0110 soundcore snd_page_alloc sky2 autofs4 nouveau ttm drm_k
ms_helper drm processor i2c_algo_bit mxm_wmi wmi video thermal_sys button
pata_via sata_promise sata_via ata_generic sata_sil pata_atiixp
[ 1499.946832]
[ 1499.946843] Pid: 2799, comm: rm Not tainted 3.2.0-rc7-1-desktop #1
[ 1499.946880] RIP: 0010:[<ffffffffa0456dd7>] [<ffffffffa0456dd7>]
io_ctl_drop_pages+0x37/0x70 [btrfs]
[ 1499.946936] RSP: 0018:ffff880127c6bc48 EFLAGS: 00010202
[ 1499.946951] RAX: 0000000000000001 RBX: ffff880127c6bcf0 RCX: ffff88012ffa3000
[ 1499.946971] RDX: 0000000000000000 RSI: ffffea0003ec0c80 RDI: ffffea0003ec0c80
[ 1499.946989] RBP: 0000000000000001 R08: 6400000000000000 R09: a8000fb032000000
[ 1499.947008] R10: 57ffda4fd1ec0c80 R11: 0000000000000000 R12: 0000000000000001
[ 1499.947028] R13: ffff880126d519b0 R14: 000000000002005a R15: 0000000000000001
[ 1499.947052] FS: 00007f6a9aa1c700(0000) GS:ffff88012fc00000(0000)
knlGS:0000000000000000
[ 1499.947078] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1499.947097] CR2: 0000000000000001 CR3: 00000001275e5000 CR4: 00000000000006f0
[ 1499.947120] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1499.947143] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1499.947167] Process rm (pid: 2799, threadinfo ffff880127c6a000, task
ffff880126378280)
[ 1499.947551] Stack:
[ 1499.947551] 0000000000000000 ffff880127c6bcf0 0000000000000000
ffffffffa0457e2e
[ 1499.947551] 0000000000000020 ffffea0003ec0c80 ffff880126d51980
ffff880127c6bd48
[ 1499.947551] ffff880126d51980 0000000000000de0 ffff880125d13720
ffff8801267e6600
[ 1499.947551] Call Trace:
[ 1499.947551] [<ffffffffa0457e2e>]
io_ctl_prepare_pages.isra.31+0x9e/0x150 [btrfs]
[ 1499.947551] [<ffffffffa0459d3f>] __load_free_space_cache+0x1ff/0x610
[btrfs]
[ 1499.947551] [<ffffffffa045b134>] load_free_ino_cache+0xd4/0x100
[btrfs]
[ 1499.947551] [<ffffffffa041a956>] start_caching+0x86/0x130 [btrfs]
[ 1499.947551] [<ffffffffa041aab5>] btrfs_return_ino+0xb5/0x170 [btrfs]
[ 1499.947551] [<ffffffffa042dc6b>] btrfs_evict_inode+0x2cb/0x320 [btrfs]
[ 1499.947551] [<ffffffff811745af>] evict+0x9f/0x1a0
[ 1499.947551] [<ffffffff8116968f>] do_unlinkat+0x15f/0x1d0
[ 1499.947551] [<ffffffff815c7812>] system_call_fastpath+0x16/0x1b
[ 1499.947551] [<00007f6a9a5539b7>] 0x7f6a9a5539b6
[ 1499.947551] Code: 0f 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 8b 43 34 85
c0 7e 3a 31 ed 0f 1f 00 48 8b 43 18 4c 63 e5 4a 8b 04 e0 48 83 c0 01 <f0>
80 20 fe 48 8b 43 18 83 c5 01 4a 8b 3c e0 e8 75 4f ca e0 48
[ 1499.947551] RIP [<ffffffffa0456dd7>] io_ctl_drop_pages+0x37/0x70
[btrfs]
[ 1499.947551] RSP <ffff880127c6bc48>
[ 1499.947551] CR2: 0000000000000001
[ 1499.977841] ---[ end trace 22016411c26ba8c7 ]---
It tries to dereference 0x1, looks like an in return value instead of pointer:
(gdb) l *(io_ctl_drop_pages+0x37)
0x627c7 is in io_ctl_drop_pages (fs/btrfs/free-space-cache.c:321).
316 {
317 int i;
318
319 io_ctl_unmap_page(io_ctl);
320
321 for (i = 0; i < io_ctl->num_pages; i++) {
322 ClearPageChecked(io_ctl->pages[i]);
323 unlock_page(io_ctl->pages[i]);
324 page_cache_release(io_ctl->pages[i]);
325 }
after reboot:
# btrfsck /dev/sdg5
root 5 inode 18446744073709551604 errors 2000
root 5 inode 18446744073709551605 errors 1
found 1482883072 bytes used err is 1
total csum bytes: 30824
total tree bytes: 972619776
total fs tree bytes: 969998336
btree space waste bytes: 192136036
file data blocks allocated: 510263296
referenced 917307392
Btrfs v0.19+
and "mount /dev/sdg5 /mnt/test" went fine, umount is stuck:
PID TTY STAT TIME COMMAND
2441 ? D 0:00 [btrfs-worker-1]
[<ffffffff810fbd19>] sleep_on_page+0x9/0x10
[<ffffffff810fbd02>] __lock_page+0x62/0x70
[<ffffffffa04462b5>] read_extent_buffer_pages+0x275/0x510 [btrfs]
[<ffffffffa041fa80>] btree_read_extent_buffer_pages.isra.101+0x80/0xc0
[btrfs]
[<ffffffffa0421030>] csum_dirty_buffer+0xd0/0x240 [btrfs]
[<ffffffffa04211d5>] __btree_submit_bio_start+0x35/0x70 [btrfs]
[<ffffffffa044ef51>] worker_loop+0xa1/0x2a0 [btrfs]
[<ffffffff8107799e>] kthread+0x7e/0x90
[<ffffffff815c99f4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
PID TTY STAT TIME COMMAND
2457 pts/1 D+ 0:00 umount /mnt/test
[<ffffffff810fbd19>] sleep_on_page+0x9/0x10
[<ffffffff810fbe4f>] wait_on_page_bit+0x6f/0x80
[<ffffffffa0444cd5>]
extent_write_cache_pages.isra.22.constprop.32+0x295/0x390 [btrfs]
[<ffffffffa0444fff>] extent_writepages+0x3f/0x60 [btrfs]
[<ffffffff810fd8dc>] __filemap_fdatawrite_range+0x4c/0x60
[<ffffffffa0425ac8>] btrfs_write_marked_extents+0x68/0xb0 [btrfs]
[<ffffffffa0425be6>] btrfs_write_and_wait_marked_extents+0x26/0x60 [btrfs]
[<ffffffffa0426371>] btrfs_commit_transaction+0x601/0x860 [btrfs]
[<ffffffff81188538>] __sync_filesystem+0x58/0x90
[<ffffffff8115c924>] generic_shutdown_super+0x34/0xe0
[<ffffffff8115ca59>] kill_anon_super+0x9/0x20
[<ffffffff8115d013>] deactivate_locked_super+0x33/0x90
[<ffffffff8117a3b1>] sys_umount+0x51/0xc0
[<ffffffff815c7812>] system_call_fastpath+0x16/0x1b
[<00007f001aca65d7>] 0x7f001aca65d7
[<ffffffffffffffff>] 0xffffffffffffffff
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2012-Jan-06 07:17 UTC
Re: Crash in io_ctl_drop_pages after mount with csum errors
David Sterba wrote:> I mounted a multi-folume fs created not-so-long ago in a 3.1 based > kernel and mounted with v3.2-rc7-83-g115e8e7 , it crashed immediately. > It''s quite possible that the disk is to blame, it''s an old 160G > SP1614C, but syslog does not contain any error messages. I''m not sure > whether the fs was cleanly unmounted, seems not, but anyway I do not > expect a crash. > > Label: none uuid: 5f06f9eb-9736-49f7-91a2-2f45522512ef > Total devices 4 FS bytes used 1.38GB > devid 4 size 34.00GB used 34.00GB path /dev/sdg8 > devid 3 size 34.00GB used 34.00GB path /dev/sdg7 > devid 2 size 34.00GB used 34.00GB path /dev/sdg6 > devid 1 size 34.00GB used 34.00GB path /dev/sdg5 > > mount options: compress-force=lzo,space_cache,autodefrag,inode_cache > > [ 1461.732855] btrfs: force lzo compression > [ 1461.732876] btrfs: enabling auto defrag > [ 1461.732893] btrfs: enabling inode map caching > [ 1461.732907] btrfs: disk space caching is enabled > [ 1499.796181] btrfs: csum mismatch on free space cache > [ 1499.796266] btrfs: failed to load free space cache for block group 29360128 > [ 1499.888699] btrfs csum failed ino 18446744073709551604 off 65536 csum 2566472073 private 1925235876 > [ 1499.888826] btrfs csum failed ino 18446744073709551604 off 327680 csum 2566472073 private 1925235876 > [ 1499.906229] btrfs csum failed ino 18446744073709551604 off 0 csum 1695430581 private 1170642078 > [ 1499.906345] btrfs csum failed ino 18446744073709551604 off 262144 csum 2566472073 private 1925235876 > [ 1499.906446] btrfs csum failed ino 18446744073709551604 off 524288 csum 2566472073 private 1925235876 > [ 1499.924469] btrfs csum failed ino 18446744073709551604 off 196608 csum 2566472073 private 1925235876 > [ 1499.924574] btrfs csum failed ino 18446744073709551604 off 458752 csum 2566472073 private 1925235876 > [ 1499.946076] btrfs csum failed ino 18446744073709551604 off 131072 csum 2566472073 private 1925235876 > [ 1499.946217] btrfs csum failed ino 18446744073709551604 off 393216 csum 2566472073 private 1925235876 > [ 1499.946318] btrfs csum failed ino 18446744073709551604 off 0 csum 1695430581 private 1170642078 > [ 1499.946362] btrfs: error reading free space cacheWe have inconsitent data on disk with both free space cache and free ino cache.> [ 1499.946409] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > [ 1499.946437] IP: [<ffffffffa0456dd7>] io_ctl_drop_pages+0x37/0x70 [btrfs]0x01 is weired, don''t know how it occured. Nevertheless we need this fix: diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index ec23d43..81771ca 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl) io_ctl_unmap_page(io_ctl); for (i = 0; i < io_ctl->num_pages; i++) { - ClearPageChecked(io_ctl->pages[i]); - unlock_page(io_ctl->pages[i]); - page_cache_release(io_ctl->pages[i]); + if (io_ctl->pages[i]) { + ClearPageChecked(io_ctl->pages[i]); + unlock_page(io_ctl->pages[i]); + page_cache_release(io_ctl->pages[i]); + } } } I''ll resend the patch along with my other pending patches for 3.3. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Jan-06 15:55 UTC
Re: Crash in io_ctl_drop_pages after mount with csum errors
On Fri, Jan 06, 2012 at 03:17:59PM +0800, Li Zefan wrote:> > [ 1499.946409] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > > [ 1499.946437] IP: [<ffffffffa0456dd7>] io_ctl_drop_pages+0x37/0x70 [btrfs] > > 0x01 is weired, don''t know how it occured. Nevertheless we need this fix: > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index ec23d43..81771ca 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl) > io_ctl_unmap_page(io_ctl); > > for (i = 0; i < io_ctl->num_pages; i++) { > - ClearPageChecked(io_ctl->pages[i]); > - unlock_page(io_ctl->pages[i]); > - page_cache_release(io_ctl->pages[i]); > + if (io_ctl->pages[i]) { > + ClearPageChecked(io_ctl->pages[i]); > + unlock_page(io_ctl->pages[i]); > + page_cache_release(io_ctl->pages[i]); > + } > } > }mount did not crash with this fix, though anything that touches files causes the crash. umount is still stuck the same way as before. I''ll not touch the partitions in case you have patches to test. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html