Mathieu Chouquet-Stringer
2012-Dec-06 14:34 UTC
GPF in read_extent_buffer while scrubbing on 3.7.0-rc8-00014-g27d7c2a
Hello, Using the last couple of kernels (3.6 or 3.7), scrubbing my btrfs fs (which is on a luks based lvm device) will always end up crashing my pc. The error (screenshot: http://mathieu.csetco.com/btrfs-scrub-crash.jpeg) looks very similar to this crash (that''s why I''m Cc''ing Sami): btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2 http://marc.info/?t=134127076600002&r=1&w=2 (I transcribed this from the picture) Pid: 518, comm: btrfs-endio-met Tainted: G W 3.7.0-rc8-00014-g27d7c2a RIP: 0010:[<fffffffff8136c186>] [<ffffffff8136c186>] memcpy+0x6/0x110 [...] Call Trace: [<ffffffff812aa8a3>] ? read_extent_buffer+0xc3/0x120 [<ffffffff812a15e2>] btrfs_node_key+0x22/0x30 [<ffffffff812dd150>] __readahead_hook.isra.5+0x3a0/0x3f0 [<ffffffff812dd544>] btree_readahead_hook+0x24/0x40 [<ffffffff81285a99>] btree_readpage_end_io_hook+0x139/0x290 [...] Looking at the email thread, I can''t seem to be able to find a happy ending. It looks like this could still be open. Here''s my btrfs device: Label: none uuid: 02c68790-78a5-418a-901c-59eda67b0e6e Total devices 1 FS bytes used 197.60GB devid 1 size 460.41GB used 289.07GB path /dev/dm-2 The output of fi df /: Data: total=269.00GB, used=195.88GB System, DUP: total=32.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=10.00GB, used=1.71GB Anything I could do to help diagnose the issue? I''ve seen Chris suggested to enable slub (I use slub, not slab) and pagealloc debug. What else? -- Mathieu Chouquet-Stringer mathieu@csetco.com The sun itself sees not till heaven clears. -- William Shakespeare -- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mathieu Chouquet-Stringer
2012-Dec-10 22:21 UTC
Re: GPF in read_extent_buffer while scrubbing on 3.7.0-rc8-00014-g27d7c2a
Hi again, after enabling page alloc and slub debug, I was able to capture an error followed by the "usual" GPF. More below. On Thu, Dec 06, 2012 at 03:34:58PM +0100, Mathieu Chouquet-Stringer wrote:> Using the last couple of kernels (3.6 or 3.7), scrubbing my btrfs fs (which is > on a luks based lvm device) will always end up crashing my pc. > [...]The warning: WARNING: at fs/btrfs/extent_io.c:4680 read_extent_buffer+0xee/0x120() Hardware name: 2392CTO Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT lockd sunrpc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack rfcomm bnep arc4 iwldvm mac80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek coretemp kvm_intel kvm microcode uvcvideo videobuf2_vmalloc videobuf2_memops pcspkr videobuf2_core videodev btusb media i2c_i801 bluetooth snd_hda_intel snd_hda_codec cdc_ncm usbnet mii cdc_wdm cdc_acm snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core iwlwifi cfg80211 e1000e snd_page_alloc mei snd_timer thinkpad_acpi snd soundcore rfkill uinput dm_crypt crc32c_intel nouveau ghash_clmulni_intel mxm_wmi ttm i915 i2c_alg o_bit drm_kms_helper firewire_ohci drm sdhci_pci firewire_core sdhci mmc_core crc_itu_t i2c_core wmi video Pid: 536, comm: btrfs-endio-met Not tainted 3.7.0-rc8-00039-ged23ec4 #4 Call Trace: [<ffffffff8104bebf>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8104bf1a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812aae9e>] read_extent_buffer+0xee/0x120 [<ffffffff812a1bb2>] btrfs_node_key+0x22/0x30 [<ffffffff812dd720>] __readahead_hook.isra.5+0x3a0/0x3f0 [<ffffffff812ddb14>] btree_readahead_hook+0x24/0x40 [<ffffffff81286069>] btree_readpage_end_io_hook+0x139/0x290 [<ffffffff812a6ff3>] end_bio_extent_readpage+0xd3/0xa40 [<ffffffff812853a6>] ? end_workqueue_fn+0x36/0x50 [<ffffffff811b27ed>] bio_endio+0x1d/0x30 [<ffffffff812853b1>] end_workqueue_fn+0x41/0x50 [<ffffffff812b5916>] worker_loop+0x136/0x580 [<ffffffff812b57e0>] ? btrfs_queue_worker+0x300/0x300 [<ffffffff8106f480>] kthread+0xc0/0xd0 [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 [<ffffffff8168e8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 gdb says it''s this WARN ON: (gdb) l *(read_extent_buffer + 0xee) 0x726e is in read_extent_buffer (fs/btrfs/extent_io.c:4680). 4675 char *kaddr; 4676 char *dst = (char *)dstv; 4677 size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); 4678 unsigned long i = (start_offset + start) >> PAGE_CACHE_SHIFT; 4679 4680 WARN_ON(start > eb->len); 4681 WARN_ON(start + len > eb->start + eb->len); 4682 4683 offset = (start_offset + start) & ((unsigned long)PAGE_CACHE_SIZE - 1); 4684 The GPF which follows right after is: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT lockd sunrpc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack rfcomm bnep arc4 iwldvm mac80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek coretemp kvm_intel kvm microcode uvcvideo videobuf2_vmalloc videobuf2_memops pcspkr videobuf2_core videodev btusb media i2c_i801 bluetooth snd_hda_intel snd_hda_codec cdc_ncm usbnet mii cdc_wdm cdc_acm snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core iwlwifi cfg80211 e1000e snd_page_alloc mei snd_timer thinkpad_acpi snd soundcore rfkill uinput dm_crypt crc32c_intel nouveau ghash_clmulni_intel mxm_wmi ttm i915 i2c_alg o_bit drm_kms_helper firewire_ohci drm sdhci_pci firewire_core sdhci mmc_core crc_itu_t i2c_core wmi video CPU 1 Pid: 536, comm: btrfs-endio-met Tainted: G W 3.7.0-rc8-00039-ged23ec4 #4 LENOVO 2392CTO/2392CTO RIP: 0010:[<ffffffff8136c756>] [<ffffffff8136c756>] memcpy+0x6/0x110 RSP: 0018:ffff8804215a9b70 EFLAGS: 00010207 RAX: ffff8804215a9c57 RBX: 0000000000000011 RCX: 0000000000000011 RDX: 0000000000000011 RSI: 0005080000000003 RDI: ffff8804215a9c57 RBP: ffff8804215a9bb8 R08: 0000000000000000 R09: 0000000000000486 R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff8804215a9c68 R13: ffff8803f4c5cfc0 R14: 0000000000000048 R15: 0000000000000011 FS: 0000000000000000(0000) GS:ffff88043e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe992197000 CR3: 0000000001c0b000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-endio-met (pid: 536, threadinfo ffff8804215a8000, task ffff8804224eb0e0) Stack: ffffffff812aae73 ffff8804105ea000 0000000000008003 0000000000001000 ffff8804215a9c20 00000000000003dd ffff88041232a7c0 ffff8804215a9c20 aeb3f3fe4e814d8e ffff8804215a9bc8 ffffffff812a1bb2 ffff8804215a9c98 Call Trace: [<ffffffff812aae73>] ? read_extent_buffer+0xc3/0x120 [<ffffffff812a1bb2>] btrfs_node_key+0x22/0x30 [<ffffffff812dd720>] __readahead_hook.isra.5+0x3a0/0x3f0 [<ffffffff812ddb14>] btree_readahead_hook+0x24/0x40 [<ffffffff81286069>] btree_readpage_end_io_hook+0x139/0x290 [<ffffffff812a6ff3>] end_bio_extent_readpage+0xd3/0xa40 [<ffffffff812853a6>] ? end_workqueue_fn+0x36/0x50 [<ffffffff811b27ed>] bio_endio+0x1d/0x30 [<ffffffff812853b1>] end_workqueue_fn+0x41/0x50 [<ffffffff812b5916>] worker_loop+0x136/0x580 [<ffffffff812b57e0>] ? btrfs_queue_worker+0x300/0x300 [<ffffffff8106f480>] kthread+0xc0/0xd0 [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 [<ffffffff8168e8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 Code: 74 13 48 8b 43 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8 eb fb ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b RIP [<ffffffff8136c756>] memcpy+0x6/0x110 RSP <ffff8804215a9b70> memcpy is called from here: (gdb) l *(read_extent_buffer + 0xc3) 0x7243 is in read_extent_buffer (fs/btrfs/extent_io.c:4694). 4689 kaddr = page_address(page); 4690 memcpy(dst, kaddr + offset, cur); 4691 4692 dst += cur; 4693 len -= cur; 4694 offset = 0; 4695 i++; 4696 } 4697 } 4698 Any idea what to do next? If you think I should be adding debug printk statements, feel free to send them my way. -- Mathieu Chouquet-Stringer mathieu@csetco.com The sun itself sees not till heaven clears. -- William Shakespeare --
Mathieu Chouquet-Stringer
2012-Dec-14 18:11 UTC
Re: GPF in read_extent_buffer while scrubbing on 3.7.0-rc8-00014-g27d7c2a
Hi, Anyone has any idea about what I should try next regarding this bug? On Mon, Dec 10, 2012 at 11:21:25PM +0100, Mathieu Chouquet-Stringer wrote:> after enabling page alloc and slub debug, I was able to capture an error > followed by the "usual" GPF. More below. > > On Thu, Dec 06, 2012 at 03:34:58PM +0100, Mathieu Chouquet-Stringer wrote: > > Using the last couple of kernels (3.6 or 3.7), scrubbing my btrfs fs (which is > > on a luks based lvm device) will always end up crashing my pc. > > [...] > > The warning: > WARNING: at fs/btrfs/extent_io.c:4680 read_extent_buffer+0xee/0x120() > Hardware name: 2392CTO > Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT lockd sunrpc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack rfcomm bnep arc4 iwldvm mac80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek coretemp kvm_intel kvm microcode uvcvideo videobuf2_vmalloc videobuf2_memops pcspkr videobuf2_core videodev btusb media i2c_i801 bluetooth snd_hda_intel snd_hda_codec cdc_ncm usbnet mii cdc_wdm cdc_acm snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core iwlwifi cfg80211 e1000e snd_page_alloc mei snd_timer thinkpad_acpi snd soundcore rfkill uinput dm_crypt crc32c_intel nouveau ghash_clmulni_intel mxm_wmi ttm i915 i2c_algo_bit drm_kms_helper firewire_ohci drm sdhci_pci firewire_core sdhci mmc_core crc_itu_t i2c_core wmi video> Pid: 536, comm: btrfs-endio-met Not tainted 3.7.0-rc8-00039-ged23ec4 #4 > Call Trace: > [<ffffffff8104bebf>] warn_slowpath_common+0x7f/0xc0 > [<ffffffff8104bf1a>] warn_slowpath_null+0x1a/0x20 > [<ffffffff812aae9e>] read_extent_buffer+0xee/0x120 > [<ffffffff812a1bb2>] btrfs_node_key+0x22/0x30 > [<ffffffff812dd720>] __readahead_hook.isra.5+0x3a0/0x3f0 > [<ffffffff812ddb14>] btree_readahead_hook+0x24/0x40 > [<ffffffff81286069>] btree_readpage_end_io_hook+0x139/0x290 > [<ffffffff812a6ff3>] end_bio_extent_readpage+0xd3/0xa40 > [<ffffffff812853a6>] ? end_workqueue_fn+0x36/0x50 > [<ffffffff811b27ed>] bio_endio+0x1d/0x30 > [<ffffffff812853b1>] end_workqueue_fn+0x41/0x50 > [<ffffffff812b5916>] worker_loop+0x136/0x580 > [<ffffffff812b57e0>] ? btrfs_queue_worker+0x300/0x300 > [<ffffffff8106f480>] kthread+0xc0/0xd0 > [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 > [<ffffffff8168e8ec>] ret_from_fork+0x7c/0xb0 > [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 > > gdb says it''s this WARN ON: > (gdb) l *(read_extent_buffer + 0xee) > 0x726e is in read_extent_buffer (fs/btrfs/extent_io.c:4680). > 4675 char *kaddr; > 4676 char *dst = (char *)dstv; > 4677 size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); > 4678 unsigned long i = (start_offset + start) >> PAGE_CACHE_SHIFT; > 4679 > > 4680 WARN_ON(start > eb->len); > > 4681 WARN_ON(start + len > eb->start + eb->len); > 4682 > 4683 offset = (start_offset + start) & ((unsigned long)PAGE_CACHE_SIZE - 1); > 4684 > > The GPF which follows right after is: > general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC > Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT lockd sunrpc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack rfcomm bnep arc4 iwldvm mac80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek coretemp kvm_intel kvm microcode uvcvideo videobuf2_vmalloc videobuf2_memops pcspkr videobuf2_core videodev btusb media i2c_i801 bluetooth snd_hda_intel snd_hda_codec cdc_ncm usbnet mii cdc_wdm cdc_acm snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core iwlwifi cfg80211 e1000e snd_page_alloc mei snd_timer thinkpad_acpi snd soundcore rfkill uinput dm_crypt crc32c_intel nouveau ghash_clmulni_intel mxm_wmi ttm i915 i2c_algo_bit drm_kms_helper firewire_ohci drm sdhci_pci firewire_core sdhci mmc_core crc_itu_t i2c_core wmi video> CPU 1 > Pid: 536, comm: btrfs-endio-met Tainted: G W 3.7.0-rc8-00039-ged23ec4 #4 LENOVO 2392CTO/2392CTO > RIP: 0010:[<ffffffff8136c756>] [<ffffffff8136c756>] memcpy+0x6/0x110 > RSP: 0018:ffff8804215a9b70 EFLAGS: 00010207 > RAX: ffff8804215a9c57 RBX: 0000000000000011 RCX: 0000000000000011 > RDX: 0000000000000011 RSI: 0005080000000003 RDI: ffff8804215a9c57 > RBP: ffff8804215a9bb8 R08: 0000000000000000 R09: 0000000000000486 > R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff8804215a9c68 > R13: ffff8803f4c5cfc0 R14: 0000000000000048 R15: 0000000000000011 > FS: 0000000000000000(0000) GS:ffff88043e240000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fe992197000 CR3: 0000000001c0b000 CR4: 00000000001407e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btrfs-endio-met (pid: 536, threadinfo ffff8804215a8000, task ffff8804224eb0e0) > Stack: > ffffffff812aae73 ffff8804105ea000 0000000000008003 0000000000001000 > ffff8804215a9c20 00000000000003dd ffff88041232a7c0 ffff8804215a9c20 > aeb3f3fe4e814d8e ffff8804215a9bc8 ffffffff812a1bb2 ffff8804215a9c98 > Call Trace: > [<ffffffff812aae73>] ? read_extent_buffer+0xc3/0x120 > [<ffffffff812a1bb2>] btrfs_node_key+0x22/0x30 > [<ffffffff812dd720>] __readahead_hook.isra.5+0x3a0/0x3f0 > [<ffffffff812ddb14>] btree_readahead_hook+0x24/0x40 > [<ffffffff81286069>] btree_readpage_end_io_hook+0x139/0x290 > [<ffffffff812a6ff3>] end_bio_extent_readpage+0xd3/0xa40 > [<ffffffff812853a6>] ? end_workqueue_fn+0x36/0x50 > [<ffffffff811b27ed>] bio_endio+0x1d/0x30 > [<ffffffff812853b1>] end_workqueue_fn+0x41/0x50 > [<ffffffff812b5916>] worker_loop+0x136/0x580 > [<ffffffff812b57e0>] ? btrfs_queue_worker+0x300/0x300 > [<ffffffff8106f480>] kthread+0xc0/0xd0 > [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 > [<ffffffff8168e8ec>] ret_from_fork+0x7c/0xb0 > [<ffffffff8106f3c0>] ? kthread_create_on_node+0x120/0x120 > Code: 74 13 48 8b 43 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8 eb fb ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b > RIP [<ffffffff8136c756>] memcpy+0x6/0x110 > RSP <ffff8804215a9b70> > > memcpy is called from here: > (gdb) l *(read_extent_buffer + 0xc3) > 0x7243 is in read_extent_buffer (fs/btrfs/extent_io.c:4694). > 4689 kaddr = page_address(page); > 4690 memcpy(dst, kaddr + offset, cur); > 4691 > 4692 dst += cur; > 4693 len -= cur; > 4694 offset = 0; > 4695 i++; > 4696 } > 4697 } > 4698 > > Any idea what to do next? If you think I should be adding debug printk > statements, feel free to send them my way.-- Mathieu Chouquet-Stringer mathieu@csetco.com The sun itself sees not till heaven clears. -- William Shakespeare --
Jan Schmidt
2013-Jan-03 17:26 UTC
Re: GPF in read_extent_buffer while scrubbing on 3.7.0-rc8-00014-g27d7c2a
Hi Mathieu, Sorry for the late reply. I had quite a good reproducer once for what I suspect may be your problem here - but it suddenly stopped reproducing the problem and I still haven''t figured out why. (see https://patchwork.kernel.org/patch/1773611/ if you''re interested) Can you please give the following patch a try and report back if it helps you (apply on top of cmason/for-linus, 57ba86c)? If it doesn''t, you''ve got three choices: a) forget about the problem b) send me a dump of your whole file system (space can be provided) c) receive debug patches, apply them and send me the output Thanks, -Jan ------>8--------- diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 65f0367..d51185e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3324,8 +3324,6 @@ int close_ctree(struct btrfs_root *root) btrfs_dev_replace_suspend_for_unmount(fs_info); - btrfs_scrub_cancel(fs_info); - /* wait for any defraggers to finish */ wait_event(fs_info->transaction_wait, (atomic_read(&fs_info->defrag_running) == 0)); @@ -3392,6 +3390,7 @@ int close_ctree(struct btrfs_root *root) btrfs_stop_workers(&fs_info->caching_workers); btrfs_stop_workers(&fs_info->readahead_workers); btrfs_stop_workers(&fs_info->flush_workers); + btrfs_scrub_cancel(fs_info); #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY if (btrfs_test_opt(root, CHECK_INTEGRITY)) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html