Sami Liedes
2012-Jul-02 23:01 UTC
btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
Hi, I just got this oops on a computer running 3.4.2. A few minutes before I had started "btrfs device scrub /" and had a watcher process running "btrfs scrub status /" every 5 seconds. After a few gigabytes of scrubbing, I got this crash. The oops is transcribed from photos, so it may contain some errors. I tried to be careful, and double checked the backtrace. Sami ------------------------------------------------------------ general protection fault: 0000 [#1] SMP CPU 4 Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables ebtable_nat ebtables cn rfcomm bnep parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm acpi_cpufreq snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmi di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev mperf pci_hotplug i2c_i801 i2c_core processor button intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod nbd btrfs libcrc32c zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata aes_generic cryptd scsi_mod e1000e thermal fa n thermal_sys [last unloaded: scsi_wait_scan] Pid: 30863, comm: btrfs-endio-met Tainted: G W 3.4.2 #1 System manufacturer System Product Name/P8P67 EVO RIP: 0010:[<ffffffff811e83bd>] [<ffffffff811e83bd>] memcpy+0xd/0x110 RSP: 0000:ffff88003174dba8 EFLAGS: 00010202 RAX: ffff88003174dc8f RBX: 0000000000000011 RCX: 0000000000000002 RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff88003174dc8f RBP: ffff88003174dbf0 R08: 000000000000000a R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003174dca0 R13: ffff8800659f42b0 R14: 0000000000000048 R15: 0000000000000011 FS: 0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000973c000 CR3: 0000000167ef3000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-endio-met (pid: 30863, threadinfo ffff88003174c000, task ffff88006f818000) Stack: ffffffffa026bd6b ffff8801960f5000 0000000000008003 0000000000001000 ffff88003174dc58 00000000000003dd ffff88000ac13c60 ffff88003174dc58 696f70203a61685f ffff88003174dc00 ffffffffa026904d ffff88003174dcd0 Call Trace: [<ffffffffa026bd6b>] ? read_extent_buffer+0xbb/0x110 [btrfs] [<ffffffffa026304d>] btrfs_node_key+0x1d/0x20 [btrfs] [<ffffffffa02994e0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] [<ffffffffa029986f>] btree_readahead_hook+0x1f/0x40 [btrfs] [<ffffffffa023f841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [<ffffffffa0267452>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [<ffffffffa026809b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [<ffffffffa023ee61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [<ffffffff81158958>] bio_endio+0x18/0x30 [<ffffffffa023ee6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [<ffffffffa0275857>] worker_loop+0x157/0x560 [btrfs] [<ffffffffa0275700>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [<ffffffff81418fe4>] kernel_thread_helper+0x4/0x10 [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [<ffffffff81418fe0>] ? gs_change+0x13/0x13 Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90... 8 4c 8b 56 10 4c RIP [<ffffffff811e83bd>] memcpy+0xd/0x110 RSP <ffff88003174dba8>
Sami Liedes
2012-Jul-02 23:08 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, Jul 03, 2012 at 02:01:21AM +0300, Sami Liedes wrote:> The oops is transcribed from photos, so it may contain some errors. I > tried to be careful, and double checked the backtrace.Forgot to menion, the fs is a two-partition filesystem, with both data and metadata mirrored. ------------------------------------------------------------ Label: none uuid: f26f08b1-89e6-4f2d-b7f9-857010dd2517 Total devices 2 FS bytes used 778.31GB devid 2 size 1.07TB used 1.05TB path /dev/dm-6 devid 1 size 1.07TB used 1.05TB path /dev/dm-5 ------------------------------------------------------------ Sami
Jan Schmidt
2012-Jul-03 13:11 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
> I just got this oops on a computer running 3.4.2. > > A few minutes before I had started "btrfs device scrub /" and had a > watcher process running "btrfs scrub status /" every 5 seconds. After > a few gigabytes of scrubbing, I got this crash. > > The oops is transcribed from photos, so it may contain some errors. IYou did *what*? :-) Uploading a photo would be fine, just in case that''s easier for you the next time.> tried to be careful, and double checked the backtrace. > > Sami > > ------------------------------------------------------------ > general protection fault: 0000 [#1] SMP > CPU 4 > Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables ebtable_nat ebtables cn rfcomm bnep > parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm acpi_cpufreq snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmi > di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev mperf pci_hotplug i2c_i801 i2c_core processor button > intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod nbd btrfs libcrc32c > zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata aes_generic cryptd scsi_mod e1000e thermal fa > n thermal_sys [last unloaded: scsi_wait_scan] > > Pid: 30863, comm: btrfs-endio-met Tainted: G W 3.4.2 #1 System manufacturer System Product Name/P8P67 EVO > RIP: 0010:[<ffffffff811e83bd>] [<ffffffff811e83bd>] memcpy+0xd/0x110 > RSP: 0000:ffff88003174dba8 EFLAGS: 00010202 > RAX: ffff88003174dc8f RBX: 0000000000000011 RCX: 0000000000000002 > RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff88003174dc8f > RBP: ffff88003174dbf0 R08: 000000000000000a R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003174dca0 > R13: ffff8800659f42b0 R14: 0000000000000048 R15: 0000000000000011 > FS: 0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000973c000 CR3: 0000000167ef3000 CR4: 00000000000407e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btrfs-endio-met (pid: 30863, threadinfo ffff88003174c000, task ffff88006f818000) > Stack: > ffffffffa026bd6b ffff8801960f5000 0000000000008003 0000000000001000 > ffff88003174dc58 00000000000003dd ffff88000ac13c60 ffff88003174dc58 > 696f70203a61685f ffff88003174dc00 ffffffffa026904d ffff88003174dcd0 > Call Trace: > [<ffffffffa026bd6b>] ? read_extent_buffer+0xbb/0x110 [btrfs] > [<ffffffffa026304d>] btrfs_node_key+0x1d/0x20 [btrfs] > [<ffffffffa02994e0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] > [<ffffffffa029986f>] btree_readahead_hook+0x1f/0x40 [btrfs] > [<ffffffffa023f841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] > [<ffffffffa0267452>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] > [<ffffffffa026809b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] > [<ffffffffa023ee61>] ? end_workqueue_fn+0x31/0x50 [btrfs] > [<ffffffff81158958>] bio_endio+0x18/0x30 > [<ffffffffa023ee6c>] end_workqueue_fn+0x3c/0x50 [btrfs] > [<ffffffffa0275857>] worker_loop+0x157/0x560 [btrfs] > [<ffffffffa0275700>] ? btrfs_queue_worker+0x310/0x310 [btrfs] > [<ffffffff81058e5e>] kthread+0x8e/0xa0 > [<ffffffff81418fe4>] kernel_thread_helper+0x4/0x10 > [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 > [<ffffffff81418fe0>] ? gs_change+0x13/0x13 > Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90... > 8 4c 8b 56 10 4c > RIP [<ffffffff811e83bd>] memcpy+0xd/0x110 > RSP <ffff88003174dba8>That''s looking strange. I checked the readahead code again: It deliberately skips locking and uses btrfs_node_key with a counter variable. This means, we might end up reading a key that''s no longer actually there. However, it only operates on nodes of trees, not leaves. Node entries have a fixed size, so no matter what changes in the node, you won''t reach behind the end of that node with an index that was valid the moment before. As far as I see it, that algorithm is safe. It could miss some keys or do some extra work that''s not strictly required, but it should never reach a GPF from btrfs_node_key. If no other ideas come up, I''d try memtesting that machine. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-03 13:14 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, Jul 03, 2012 at 02:01:21AM +0300, Sami Liedes wrote:> I just got this oops on a computer running 3.4.2.I now repeated this on 3.4.4. Merely running a "btrfs scrub start /" causes this after a couple of minutes of running. This time I didn''t run "btrfs scrub status /" in a loop, so that''s not the cause. And judging from the two tries and two crashes, this seems very repeatable. The backtrace is basically exactly the same as with 3.4.4. Sami
Sami Liedes
2012-Jul-03 13:58 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, Jul 03, 2012 at 03:11:04PM +0200, Jan Schmidt wrote:> > The oops is transcribed from photos, so it may contain some errors. I > > You did *what*? :-) Uploading a photo would be fine, just in case that''s easier > for you the next time.Nah, it''s sometimes refreshing to do something that doesn''t take too much thinking for a while. Besides I fault myself for not having had netconsole logging on that machine and needed to punish myself for that omission ;)> That''s looking strange. > > I checked the readahead code again: It deliberately skips locking and uses > btrfs_node_key with a counter variable. This means, we might end up reading a > key that''s no longer actually there. However, it only operates on nodes of > trees, not leaves. Node entries have a fixed size, so no matter what changes in > the node, you won''t reach behind the end of that node with an index that was > valid the moment before. > > As far as I see it, that algorithm is safe. It could miss some keys or do some > extra work that''s not strictly required, but it should never reach a GPF from > btrfs_node_key. > > If no other ideas come up, I''d try memtesting that machine.Ran a full pass of memtest86+ v4.20, with no errors found. Also the machine works very well in all other respects under heavy load. I think I might try setting up that netconsole to see if there are any interesting console messages before the oops... As I said, I also was able to reproduce this on 3.4.4, so ATM I assume I''m able to reproduce this at will. Sami
Jan Schmidt
2012-Jul-03 14:35 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 03.07.2012 15:58, Sami Liedes wrote:> I think I might try setting up that netconsole to see if there are any > interesting console messages before the oops... As I said, I also was > able to reproduce this on 3.4.4, so ATM I assume I''m able to reproduce > this at will.That would be helpful. It''s good that it''s reproducible. Please paste the entire output if possible. If there''s nothing immediately useful in there, I''ll try to make up some debugging patches for that problem. Thanks! -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-03 22:47 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, Jul 03, 2012 at 04:35:25PM +0200, Jan Schmidt wrote:> On 03.07.2012 15:58, Sami Liedes wrote: > > I think I might try setting up that netconsole to see if there are any > > interesting console messages before the oops... As I said, I also was > > able to reproduce this on 3.4.4, so ATM I assume I''m able to reproduce > > this at will. > > That would be helpful. It''s good that it''s reproducible. Please paste the entire > output if possible. > > If there''s nothing immediately useful in there, I''ll try to make up some > debugging patches for that problem.There''s certainly something there; unfortunately netconsole eats most of it. (Is there a better way to capture full dmesg from panic? Assuming a computer without a serial port.) I''ve seen this before: An overly long "Modules linked in:" line causes a large gap in netconsole output. Hovever perhaps the most interesting piece is there: [30846.822395] WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]() Unfortunately, no backtrace for that :( Ideas? Sami ------------------------------------------------------------ [30638.258946] netpoll: netconsole: local port 4444 [30638.258950] netpoll: netconsole: local IP 192.168.1.2 [30638.258952] netpoll: netconsole: interface ''eth0'' [30638.258953] netpoll: netconsole: remote port 1194 [30638.258954] netpoll: netconsole: remote IP 192.168.1.73 [30638.258956] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c [30638.259002] console [netcon0] enabled [30638.259004] netconsole: network logging started [30846.822356] ------------[ cut here ]------------ [30846.822395] WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]() [30846.822400] Hardware name: System Product Name [30846.822402] Modules linked in: netconsole configfs tun cn ip6_tables ebtable_nat ebtables parport_pc ppdev lp parport bnep rfcomm cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative binfmt_misc nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm i2c_algo_bit loop fuse kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_usbmidi_lib snd_hda_codec snd_hwdep snd_pcm_oss joydev snd_mixer_oss snd_pcm snd_page_alloc eeepc_wmi90 90 90 48 89 f8 48 89 90 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [30846.825219] RIP [<ffffffff811e84dd>] memcpy+0xd/0x110 [30846.825248] RSP <ffff88009ecebba8> xord1 raid6_pq md_mod nbd btrfs libcrc32c zlib_deflate sd_mod crc_t10dif ahci libahci crc32c_intel r8169 pci_hotplug bluetooth snd_seq_device snd_timer snd intel_agp acpi_cpufreq mperf mxm_wmi psmouse wmi coretemp soundcore processor serio_raw i2c_i801 iTCO_wdt video rfkill microcode crc16 evdev i2c_core button intel_gtt[30846.822679] [<ffffffffa0249841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx[30846.822773] [<ffffffff81419120>] ? gs_change+0x13/0x13 md_mod nbd btrfs libcrc32c zlib_deflate sd_mod crc_t10dif ahci libahci raid6_pq r8169 mii firewire_ohci ghash_clmulni_intel firewire_core aesni_intel crc_itu_t aes_x86_64 bnep aes_generic cryptd scsi_mod e1000e fan thermal nfsd thermal_sys [last unloaded: netconsole] auth_rpcgss[30846.823704] Pid: 16224, comm: btrfs-endio-met Tainted: G W 3.4.4 #1 System manufacturer System Product Name/P8P67 EVO [30846.823767] RIP: 0010:[<ffffffff811e84dd>] [<ffffffff811e84dd>] memcpy+0xd/0x110 [30846.823805] RSP: 0000:ffff88009ecebba8 EFLAGS: 00010202 [30846.823830] RAX: ffff88009ecebc8f RBX: 0000000000000011 RCX: 0000000000000002 [30846.823865] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff88009ecebc8f [30846.823933] R10: ffff88001d2395c0 R11: ffff8801f198c780 R12: ffff88009ecebca0 [30846.823968] R13: ffff88008eb01178 R14: 0000000000000048 R15: 0000000000000011 [30846.824002] FS: 0000000000000000(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000 [30846.824042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [30846.824069] CR2: 000000000add813c CR3: 0000000163305000 CR4: 00000000000407e0 [30846.823694] [30846.824101] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [30846.824136] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [30846.824171] Process btrfs-endio-met (pid: 16224, threadinfo ffff88009ecea000, task ffff88020f3142f0) [30846.824225] ffffffffa0275d5b ffff880167d30000 0000000000008003 0000000000001000 [30846.824273] ffff88009ecebc58 00000000000003dd[30846.824214] Stack: ffff880099cdea20 [30846.824318] 6172745f69706d20 ffff88009ecebc00 ffffffffa026d03d ffff88009ecebc58 ffff88009ecebcd0 [30846.824363] Call Trace: [30846.824434] [<ffffffffa026d03d>] btrfs_node_key+0x1d/0x20 [btrfs] [30846.824474] [<ffffffffa02a34d0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] [30846.824517] [<ffffffffa02a385f>] btree_readahead_hook+0x1f/0x40 [btrfs] [30846.824556] [<ffffffffa0249841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [30846.824602] [<ffffffffa0271442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [30846.824647] [<ffffffffa027208b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [30846.824391] [<ffffffffa0275d5b>] ? read_extent_buffer+0xbb/0x110 [btrfs] [30846.824690] [<ffffffffa0248e61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [30846.824756] [<ffffffffa0248e6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [30846.824797] [<ffffffffa027f847>] worker_loop+0x157/0x560 [btrfs] [30846.824836] [<ffffffffa027f6f0>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [30846.824870] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [30846.824895] [<ffffffff81419124>] kernel_thread_helper+0x4/0x10 [30846.824925] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [30846.824957] [<ffffffff81419120>] ? gs_change+0x13/0x13 [30846.824983] Code: [30846.824724] [<ffffffff81158a78>] bio_endio+0x18/0x30 4e 83 c4 08 5b 5d c3 66 0f 1f 48 44 00 e8 eb fb 00 eb 90 90 90 90 90 [30846.835959] ---[ end trace e93713a9d40cd06f ]--- [30874.611121] SysRq : Resetting [30874.611134] ACPI MEMORY or I/O RESET_REG.
Sami Liedes
2012-Jul-04 00:17 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:> I''ve seen this before: An overly long "Modules linked in:" line causes > a large gap in netconsole output.I managed to capture the entire output using netconsole by modifying the kernel to not output the list of modules. Sami ------------------------------------------------------------ [ 125.827919] netpoll: netconsole: local port 4444 [ 125.827946] netpoll: netconsole: local IP 192.168.1.2 [ 125.827969] netpoll: netconsole: interface ''eth0'' [ 125.827990] netpoll: netconsole: remote port 1194 [ 125.828011] netpoll: netconsole: remote IP 192.168.1.73 [ 125.828034] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c [ 125.828169] console [netcon0] enabled [ 125.828193] netconsole: network logging started [ 247.787472] ------------[ cut here ]------------ [ 247.787536] WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]() [ 247.787573] Hardware name: System Product Name [ 247.787594] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 247.787632] Pid: 1146, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #1 [ 247.787674] Call Trace: [ 247.787692] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [ 247.787794] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 [ 247.787835] [<ffffffffa01d7d86>] read_extent_buffer+0xe6/0x110 [btrfs] [ 247.787877] [<ffffffffa01cf03d>] btrfs_node_key+0x1d/0x20 [btrfs] [ 247.787917] [<ffffffffa02054d0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] [ 247.787959] [<ffffffffa020585f>] btree_readahead_hook+0x1f/0x40 [btrfs] [ 247.787999] [<ffffffffa01ab841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [ 247.788043] [<ffffffffa01d3442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 247.788087] [<ffffffffa01d408b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 247.788129] [<ffffffffa01aae61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 247.788161] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 247.788194] [<ffffffffa01aae6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 247.788234] [<ffffffffa01e1847>] worker_loop+0x157/0x560 [btrfs] [ 247.788272] [<ffffffffa01e16f0>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 247.788306] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 247.788332] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 247.788360] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 247.788389] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 247.788413] ---[ end trace e93713a9d40cd06e ]--- [ 247.788444] general protection fault: 0000 [#1] SMP [ 247.788473] CPU 5 [ 247.788484] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 247.788523] [ 247.788533] Pid: 1146, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #1 System manufacturer System Product Name/P8P67 EVO [ 247.788595] RIP: 0010:[<ffffffff811e83ed>] [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 247.788632] RSP: 0000:ffff8802152f3ba8 EFLAGS: 00010202 [ 247.788656] RAX: ffff8802152f3c8f RBX: 0000000000000011 RCX: 0000000000000002 [ 247.788687] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff8802152f3c8f [ 247.788719] RBP: ffff8802152f3bf0 R08: 0000000000000000 R09: 0000000000000000 [ 247.788750] R10: ffff880207dd89c0 R11: ffff8801f3a90780 R12: ffff8802152f3ca0 [ 247.788782] R13: ffff8801c1d5b830 R14: 0000000000000048 R15: 0000000000000011 [ 247.788814] FS: 0000000000000000(0000) GS:ffff88021ed40000(0000) knlGS:0000000000000000 [ 247.788850] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 247.788876] CR2: 00000000038942a0 CR3: 00000001c26ce000 CR4: 00000000000407e0 [ 247.788907] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 247.788939] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 247.788971] Process btrfs-endio-met (pid: 1146, threadinfo ffff8802152f2000, task ffff8801f4192ca0) [ 247.789009] Stack: [ 247.789021] ffffffffa01d7d5b ffff8802010b7000 0000000000008003 0000000000001000 [ 247.789066] ffff8802152f3c58 00000000000003dd ffff88021032fa20 ffff8802152f3c58 [ 247.789112] 6669646e65230a29 ffff8802152f3c00 ffffffffa01cf03d ffff8802152f3cd0 [ 247.789157] Call Trace: [ 247.789184] [<ffffffffa01d7d5b>] ? read_extent_buffer+0xbb/0x110 [btrfs] [ 247.789226] [<ffffffffa01cf03d>] btrfs_node_key+0x1d/0x20 [btrfs] [ 247.789265] [<ffffffffa02054d0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] [ 247.789308] [<ffffffffa020585f>] btree_readahead_hook+0x1f/0x40 [btrfs] [ 247.789348] [<ffffffffa01ab841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [ 247.789391] [<ffffffffa01d3442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 247.789434] [<ffffffffa01d408b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 247.789475] [<ffffffffa01aae61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 247.789506] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 247.789539] [<ffffffffa01aae6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 247.789578] [<ffffffffa01e1847>] worker_loop+0x157/0x560 [btrfs] [ 247.789616] [<ffffffffa01e16f0>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 247.789648] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 247.789672] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 247.789700] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 247.789729] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 247.789753] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [ 247.790092] RIP [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 247.790120] RSP <ffff8802152f3ba8> [ 247.790138] ---[ end trace e93713a9d40cd06f ]---
Jan Schmidt
2012-Jul-04 11:26 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 04.07.2012 02:17, Sami Liedes wrote:> On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote: >> I''ve seen this before: An overly long "Modules linked in:" line causes >> a large gap in netconsole output. > > I managed to capture the entire output using netconsole by modifying > the kernel to not output the list of modules.Okay, thanks for the output. Can you please apply the patch below and capture especially the line printed before the "cut here" line? Thanks! -Jan diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c9018a0..beabe99 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4519,7 +4519,14 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); unsigned long i = (start_offset + start) >> PAGE_CACHE_SHIFT; - WARN_ON(start > eb->len); + if (start > eb->len) { + printk(KERN_ERR "btrfs: invalid parameters for read_extent_buffer: start (%lu) > eb->len (%lu). eb start is %llu, level %d, generation %llu, nritems %d. len param %lu. debug %llu/%llu/%llu/%llu\n", + start, eb->len, eb->start, btrfs_header_level(eb), + btrfs_header_generation(eb), btrfs_header_nritems(eb), + len, + eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3]); + WARN_ON(1); + } WARN_ON(start + len > eb->start + eb->len); offset = (start_offset + start) & ((unsigned long)PAGE_CACHE_SIZE - 1); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index b516c3b..1bbf823 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -164,6 +164,8 @@ struct extent_buffer { wait_queue_head_t lock_wq; struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES]; struct page **pages; + + u64 debug[4]; }; static inline void extent_set_compress_type(unsigned long *bio_flags, diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index ac5d010..d9c1146 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -168,10 +168,15 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, struct btrfs_key key; struct btrfs_key next_key; + eb->debug[0] = 1; + eb->debug[1] = i; + eb->debug[2] = nritems; + eb->debug[3] = generation; btrfs_node_key_to_cpu(eb, &key, i); - if (i + 1 < nritems) + if (i + 1 < nritems) { + eb->debug[0] = 2; btrfs_node_key_to_cpu(eb, &next_key, i + 1); - else + } else next_key = re->top; bytenr = btrfs_node_blockptr(eb, i); n_gen = btrfs_node_ptr_generation(eb, i); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-04 16:03 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Wed, Jul 04, 2012 at 01:26:46PM +0200, Jan Schmidt wrote:> On 04.07.2012 02:17, Sami Liedes wrote: > > On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote: > >> I''ve seen this before: An overly long "Modules linked in:" line causes > >> a large gap in netconsole output. > > > > I managed to capture the entire output using netconsole by modifying > > the kernel to not output the list of modules. > > Okay, thanks for the output. Can you please apply the patch below and capture > especially the line printed before the "cut here" line?Here you go. Sami ------------------------------------------------------------ [ 121.524803] netpoll: netconsole: local port 4444 [ 121.524831] netpoll: netconsole: local IP 192.168.1.2 [ 121.524853] netpoll: netconsole: interface ''eth0'' [ 121.524874] netpoll: netconsole: remote port 1194 [ 121.524894] netpoll: netconsole: remote IP 192.168.1.73 [ 121.524917] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c [ 121.525055] console [netcon0] enabled [ 121.525074] netconsole: network logging started [ 200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391 [ 200.980594] ------------[ cut here ]------------ [ 200.980644] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() [ 200.980681] Hardware name: System Product Name [ 200.980701] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 200.980739] Pid: 1145, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 [ 200.980774] Call Trace: [ 200.980792] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [ 200.980821] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 [ 200.980860] [<ffffffffa01d6e07>] read_extent_buffer+0x167/0x1a0 [btrfs] [ 200.980902] [<ffffffffa01ce03d>] btrfs_node_key+0x1d/0x20 [btrfs] [ 200.980941] [<ffffffffa020459f>] __readahead_hook.isra.5+0x3ff/0x460 [btrfs] [ 200.980982] [<ffffffffa020492f>] btree_readahead_hook+0x1f/0x40 [btrfs] [ 200.981022] [<ffffffffa01aa841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [ 200.981065] [<ffffffffa01d2442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 200.981109] [<ffffffffa01d308b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 200.981150] [<ffffffffa01a9e61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 200.981182] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 200.981214] [<ffffffffa01a9e6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 200.981253] [<ffffffffa01e08d7>] worker_loop+0x157/0x560 [btrfs] [ 200.981291] [<ffffffffa01e0780>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 200.981323] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 200.981348] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 200.981377] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 200.981406] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 200.981430] ---[ end trace e93713a9d40cd06e ]--- [ 200.981459] general protection fault: 0000 [#1] SMP [ 200.981487] CPU 2 [ 200.981498] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 200.981540] [ 200.981550] Pid: 1145, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO [ 200.981612] RIP: 0010:[<ffffffff811e83ed>] [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 200.981650] RSP: 0000:ffff8801f4bf7b68 EFLAGS: 00010202 [ 200.981675] RAX: ffff8801f4bf7c8f RBX: 0000000000000011 RCX: 0000000000000002 [ 200.981707] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff8801f4bf7c8f [ 200.981738] RBP: ffff8801f4bf7be0 R08: 0000000000000000 R09: 0000000000000000 [ 200.981769] R10: ffff8801f4bf7c8f R11: ffff8801f3930780 R12: ffff8801f4bf7ca0 [ 200.981800] R13: ffff8801f7286178 R14: 0000000000000048 R15: 0000000000000011 [ 200.981832] FS: 0000000000000000(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000 [ 200.981868] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 200.981894] CR2: 00000000f773a000 CR3: 000000020d0de000 CR4: 00000000000407e0 [ 200.981925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 200.981956] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 200.981988] Process btrfs-endio-met (pid: 1145, threadinfo ffff8801f4bf6000, task ffff8801f4ff42f0) [ 200.982026] Stack: [ 200.982038] ffffffffa01d6d5b ffff880124f72ce1 0000000000000011 0000000000000002 [ 200.982084] 00000000000003dd 0000000024f72ce1 2ba295e6a4a5d41f ffff88020dc65000 [ 200.982130] ffff8801f4bf7c8f 0000000000001000 ffff8801f4bf7c58 00000000000003dd [ 200.982175] Call Trace: [ 200.982201] [<ffffffffa01d6d5b>] ? read_extent_buffer+0xbb/0x1a0 [btrfs] [ 200.982243] [<ffffffffa01ce03d>] btrfs_node_key+0x1d/0x20 [btrfs] [ 200.982281] [<ffffffffa020459f>] __readahead_hook.isra.5+0x3ff/0x460 [btrfs] [ 200.982322] [<ffffffffa020492f>] btree_readahead_hook+0x1f/0x40 [btrfs] [ 200.982362] [<ffffffffa01aa841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [ 200.982405] [<ffffffffa01d2442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 200.982448] [<ffffffffa01d308b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 200.982489] [<ffffffffa01a9e61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 200.982519] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 200.982552] [<ffffffffa01a9e6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 200.982591] [<ffffffffa01e08d7>] worker_loop+0x157/0x560 [btrfs] [ 200.982628] [<ffffffffa01e0780>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 200.982660] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 200.982684] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 200.982713] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 200.982742] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 200.982766] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [ 200.983037] RIP [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 200.983065] RSP <ffff8801f4bf7b68> [ 200.983087] ---[ end trace e93713a9d40cd06f ]--- [ 361.453631] SysRq : Resetting [ 361.453663] ACPI MEMORY or I/O RESET_REG.
Jan Schmidt
2012-Jul-04 16:38 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 04.07.2012 18:03, Sami Liedes wrote:> Here you go. > > Sami > [...] > [ 200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391Wow, that''s strange. Can you repeat your test once or twice and paste that line, please? I''d like to get a feeling if the values are completely random. Reading more of the readahead code now... Thanks, -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-04 20:24 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Wed, Jul 04, 2012 at 06:38:00PM +0200, Jan Schmidt wrote:> > [ 200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391Let''s call this try 1. I ran it three more times, so below we have tries 2, 3 and 4.> Wow, that''s strange. Can you repeat your test once or twice and paste that line, > please? I''d like to get a feeling if the values are completely random.Curiously, it clearly takes longer for it to crash after starting the scrub each time I run it. Also on try 4 I got an entirely different crash (backtrace below). Now it scrubs maybe the first 200G or so of both devices of the (raid-1) 2.2T filesystem before it crashes. start and eb->len seem to be the same (32771 and 32768) every time. eb start varies, but there''s some pattern if you view them in hex: Try 1 20a5a660000 Try 2 20bb0018000 Try 3 20a8bc28000 Try 4 (no output, different crash) The rest of the values seem to me to be completely different every time. Sami Try 2: ------------------------------------------------------------ [12961.870107] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2249220784128, level 14, generation 2242260605927040034, nritems 117835525. len param 17. debug 2/989/117835525/2242260605927040034 [12961.870204] ------------[ cut here ]------------ [12961.870264] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() [12961.870302] Hardware name: System Product Name [12961.870322] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [12961.870367] Pid: 1144, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 [12961.870403] Call Trace: [12961.870421] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [12961.870449] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 [...] ------------------------------------------------------------ Try 3: ------------------------------------------------------------ [ 531.770984] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2244317708288, level 170, generation 13639284858109917187, nritems 6171943. len param 17. debug 2/989/6171943/13639284858109917187 [ 531.771081] ------------[ cut here ]------------ [ 531.771133] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() [ 531.771169] Hardware name: System Product Name [ 531.771191] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 531.771229] Pid: 1132, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 [ 531.771265] Call Trace: [ 531.771282] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [...] ------------------------------------------------------------ Try 4: ------------------------------------------------------------ [ 95.933108] netconsole: network logging started [ 982.651987] unable to find logical 691402650139365534 len 32768 [ 982.652060] ------------[ cut here ]------------ [ 982.652085] kernel BUG at fs/btrfs/volumes.c:3725! [ 982.652109] invalid opcode: 0000 [#1] SMP [ 982.652138] CPU 4 [ 982.652149] Modules linked in: <omitted> [last unloaded: scsi_wait_scan] [ 982.652190] [ 982.652201] Pid: 1127, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO [ 982.652264] RIP: 0010:[<ffffffffa0257778>] [<ffffffffa0257778>] __btrfs_map_block+0x668/0x680 [btrfs] [ 982.652323] RSP: 0000:ffff8801f43cfa70 EFLAGS: 00010286 [ 982.652347] RAX: 0000000000000049 RBX: 09985b0c0e52109e RCX: 0000000000000082 [ 982.652379] RDX: 00000000000000e8 RSI: 0000000000000046 RDI: 0000000000000246 [ 982.652411] RBP: ffff8801f43cfb10 R08: 0000000000000000 R09: 0000000000000000 [ 982.652443] R10: ffff8801f1fbf680 R11: ffff8801f3b00780 R12: ffff8801f0026108 [ 982.652475] R13: 0000000000008000 R14: ffff8801f0026fe0 R15: ffff8801f43cfbb0 [ 982.652507] FS: 0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000 [ 982.652542] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 982.652568] CR2: 00000000f775c000 CR3: 00000002110a1000 CR4: 00000000000407e0 [ 982.652600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 982.652632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 982.652664] Process btrfs-endio-met (pid: 1127, threadinfo ffff8801f43ce000, task ffff8801f401c2f0) [ 982.652703] Stack: [ 982.652715] ffff8801f43cfa90 ffffffff811dc553 ffff88021ec0ce88 ffff88021ed0ce80 [ 982.652761] ffff8801f43cfb00 0000000000000086 ffff8801f43cfaf0 0000000000000086 [ 982.652807] 000000fdf43cfad0 ffff8801f43cfba8 0000002000000004 0000000000000004 [ 982.652852] Call Trace: [ 982.652869] [<ffffffff811dc553>] ? cpumask_next_and+0x23/0x40 [ 982.652897] [<ffffffff8111d7e2>] ? kmem_cache_alloc_trace+0xc2/0x100 [ 982.652940] [<ffffffffa025aac9>] btrfs_map_block+0x9/0x10 [btrfs] [ 982.652979] [<ffffffffa0280ad2>] reada_add_block+0x1c2/0x890 [btrfs] [ 982.653010] [<ffffffff8140a4a2>] ? __slab_free+0xde/0x24e [ 982.653045] [<ffffffffa0281447>] __readahead_hook.isra.5+0x2a7/0x460 [btrfs] [ 982.653086] [<ffffffffa028192f>] btree_readahead_hook+0x1f/0x40 [btrfs] [ 982.653126] [<ffffffffa0227841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] [ 982.653168] [<ffffffffa024f442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 982.653211] [<ffffffffa025008b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 982.653252] [<ffffffffa0226e61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 982.653284] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 982.653317] [<ffffffffa0226e6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 982.653355] [<ffffffffa025d8d7>] worker_loop+0x157/0x560 [btrfs] [ 982.653393] [<ffffffffa025d780>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 982.653427] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 982.653451] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 982.653480] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 982.653509] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 982.653533] Code: e6 89 c7 eb a4 0f 0b c7 45 c4 01 00 00 00 31 db e9 08 fd ff ff 0f 0b 49 8b 17 48 89 de 48 c7 c7 08 0c 29 a0 31 c0 e8 e3 09 1b e1 <0f> 0b 0f 0b 89 df e9 73 ff ff ff 66 66 66 66 2e 0f 1f 84 00 00 [ 982.653807] RIP [<ffffffffa0257778>] __btrfs_map_block+0x668/0x680 [btrfs] [ 982.653853] RSP <ffff8801f43cfa70> [ 982.653877] ---[ end trace e93713a9d40cd06e ]--- ------------------------------------------------------------
Jan Schmidt
2012-Jul-05 13:41 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 04.07.2012 22:24, Sami Liedes wrote:> On Wed, Jul 04, 2012 at 06:38:00PM +0200, Jan Schmidt wrote: >>> [ 200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391 > > Let''s call this try 1. I ran it three more times, so below we have > tries 2, 3 and 4. > >> Wow, that''s strange. Can you repeat your test once or twice and paste that line, >> please? I''d like to get a feeling if the values are completely random. > > Curiously, it clearly takes longer for it to crash after starting the > scrub each time I run it. Also on try 4 I got an entirely different > crash (backtrace below). Now it scrubs maybe the first 200G or so of > both devices of the (raid-1) 2.2T filesystem before it crashes.Can you double check that there''s nothing about corrected errors in your logs? Scrub will correct errors along the way and log that. So maybe we''ve only a few tries left to find the root cause.> start and eb->len seem to be the same (32771 and 32768) every time.That''s the tree block size in your setup.> eb start varies, but there''s some pattern if you view them in hex: > > Try 1 20a5a660000 > Try 2 20bb0018000 > Try 3 20a8bc28000 > Try 4 (no output, different crash)I fail to identify a real pattern there.> The rest of the values seem to me to be completely different every time.Which is itself interesting.> Try 4:That one is not that much different. We read some garbage from a tree block and started the next read ahead cycle for the alledged children. That way we came to a logical address that''s out of bounds, instead of a logical address with even more garbage. I''d like to see if you corrupted your trees on disk in a really strange manner (with matching checksums?), or if data comes from the disk intact and becomes damaged thereafter. Could you store the output of btrfs-debug-tree /dev/[whatever] before try number 5 and afterwards? It will be quite a lot if you''ve got a lot of files in there. Don''t send it anywhere right now, just store it away if possible. What I''d like to get in the next reply is the output of the attached patch, a single pass should do this time. NB: As we''ve already check_leaf doing exta leaf checks after reading them, we should probably add something like check_node as a general manner to make btrfs more robust. Thank you, -Jan --- diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..34122c2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -316,6 +316,11 @@ static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf, return 0; } +int btrfs_csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf) +{ + return csum_tree_block(root, buf, 1); +} + /* * we can''t consider a given block up to date unless the transid of the * block matches the transid in the parent node''s pointer. This is how we @@ -471,6 +476,12 @@ static int check_tree_block_fsid(struct btrfs_root *root, (unsigned long long)btrfs_header_bytenr(eb), \ (unsigned long long)root->objectid, slot) +#define CORRUPT_NODE(root, node, reason, ...) \ + printk(KERN_CRIT "btrfs: corrupt node block=%llu," \ + "root=%llu: " reason, \ + (unsigned long long)btrfs_header_bytenr(node), \ + (unsigned long long)root->objectid, ##__VA_ARGS__) + static noinline int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) { @@ -532,6 +543,42 @@ static noinline int check_leaf(struct btrfs_root *root, return 0; } +static noinline int check_node(struct btrfs_root *root, + struct extent_buffer *node) +{ + int i; + u32 nritems = btrfs_header_nritems(node); + u64 generation; + + if (nritems == 0) + return 0; + + if (nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) { + CORRUPT_NODE(root, node, "nritems (%lu) too large (%lu)\n", + (unsigned long)nritems, + BTRFS_NODEPTRS_PER_BLOCK(root)); + return -EIO; + } + + if (node->len > root->nodesize) { + CORRUPT_NODE(root, node, "length (%lu) too large (%lu)\n", + node->len, (unsigned long)root->nodesize); + return -EIO; + } + + generation = btrfs_super_generation(root->fs_info->super_copy); + for (i = 0; i < nritems; i++) { + if (btrfs_node_ptr_generation(node, i) > generation) { + CORRUPT_NODE(root, node, "generation (%llu) too new in slot %d (maximum expected %llu)\n", + btrfs_node_ptr_generation(node, i), i, + generation); + return -EIO; + } + } + + return 0; +} + struct extent_buffer *find_eb_for_page(struct extent_io_tree *tree, struct page *page, int max_walk) { @@ -634,6 +681,10 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, set_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags); ret = -EIO; } + if (found_level != 0 && check_node(root, eb)) { + set_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags); + ret = -EIO; + } if (!ret) set_extent_buffer_uptodate(eb); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index beabe99..099ce6e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4507,6 +4507,7 @@ unlock_exit: return ret; } +extern int btrfs_csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf); void read_extent_buffer(struct extent_buffer *eb, void *dstv, unsigned long start, unsigned long len) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index d9c1146..b659c8d 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -103,6 +103,7 @@ static void __reada_start_machine(struct btrfs_fs_info *fs_info); static int reada_add_block(struct reada_control *rc, u64 logical, struct btrfs_key *top, int level, u64 generation); +extern int btrfs_csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf); /* recurses */ /* in case of err, eb might be NULL */ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, @@ -144,6 +145,10 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, if (err == 0) { nritems = level ? btrfs_header_nritems(eb) : 0; + if (level > BTRFS_MAX_LEVEL || + nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) + printk(KERN_ERR "btrfs: node seems invalid now. checksum ok = %d\n", + btrfs_csum_tree_block(root, eb)); generation = btrfs_header_generation(eb); /* * FIXME: currently we just set nritems to 0 if this is a leaf, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-05 23:47 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Thu, Jul 05, 2012 at 03:41:33PM +0200, Jan Schmidt wrote:> I''d like to see if you corrupted your trees on disk in a really strange > manner (with matching checksums?), or if data comes from the disk intact > and becomes damaged thereafter. > > Could you store the output of > btrfs-debug-tree /dev/[whatever] > before try number 5 and afterwards? It will be quite a lot if you''ve got > a lot of files in there. Don''t send it anywhere right now, just store it > away if possible.Ok, I have it stored. I mounted the fs read-only to do that just to be safe, because when I did it on a live fs I got some (~15) "parent transid verify failed" messages.> Can you double check that there''s nothing about corrected errors in your > logs? Scrub will correct errors along the way and log that. So maybe > we''ve only a few tries left to find the root cause.Nope, definitely nothing there. A few "unlinked orphans" messages at mount time, but that''s all. I''m sure I would have captured any further messages on my netconsole. Also nothing in any earlier logs that I can see. Well, until now (try 6) anyway. I actually ran btrfs scrub / twice now after running btrfs-debug-tree (tries 5 and 6), because on the first try I ran on wrong kernel (without the debug patch). On try 5 it crashed very quickly, probably in less than a second. I also noticed that it doesn''t actually crash the machine or even make the filesystem unusable in any way (apart from unmount blocking at shutdown), so I also have the same oopses in my log files stored on the same filesystem. But no, nothing about corrected errors or checksum failures. So: Try 5 (without your new patch, but after running btrfs-debug-tree); no checksum failures as before, similar crash to before: ------------------------------------------------------------ [ 2036.512656] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239379931136, level 102, generation 309534250463128, nritems 56187918. len param 17. debug 2/989/56187918/309534250463128 [ 2036.512770] ------------[ cut here ]------------ [ 2036.512834] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() [ 2036.512880] Hardware name: System Product Name [ 2036.512906] Modules linked in: <omitted> [last unloaded: netconsole] [ 2036.512954] Pid: 1135, comm: btrfs-endio-met Tainted: G W 3.4.4-modded-oops+ #2 [...] ------------------------------------------------------------ Try 6, with your new patch - as a new thing, got checksum errors, including one very slightly before the crash and with the same "eb start" as in the "invalid parameters" message. Crashed in a couple of minutes. Now that I look at the the log below it actually has two crashes in it, one of btrfs-endio-met pid 1212 and another one of btrfs-endio-met pid some 72 seconds later. I didn''t notice that until after the fact. dm-6 (the device with checksum failures) is on a disk on which I don''t have anything else besides this partition, so I suppose it could be corrupting data without me having noticed. I know it used to work at one time, but that doesn''t mean too much. Anyway, I''m sure I haven''t seen any "checksum verify failed" messages until now. None in the logs either. Ah, one thing I have forgotten to mention. I think I''ve been using this filesystem for maybe a month now. It started as a single-device filesystem, but I did restriping to raid-1 on June 12. dm-5 is the original device, while dm-6 (the one with the checksum failures in try 6) is the added one. I still have logs of the restriping and everything after that; grep reveals no unusual lines containing "btrfs" in all the logs since then, apart from the crashes. Besides the oopses and lines that mention btrfs just because of the kernel version 3.4.4+btrfsdebug, this is a summary of all the lines that mention btrfs: btrfs: disk space caching is enabled btrfs flagging fs with big metadata feature btrfs: new size for /dev/mapper/rootvg-scratch_crypt is 751617179648 * this device is not on the filesystem with problems - although I haven''t tried scrubbing this filesystem btrfs: unlinked 104 orphans (10 times with 4 different numbers) And during restriping btrfs: found 10008 extents (and similar) btrfs: relocating block group 1019010351104 flags 1 (and similar) And two different warnings in the last few days, but I think this is only because the debug patches changed the code: WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]() WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() Another thing that comes to mind that I might well be the only one that uses btrfs with raid-1 on two separate dm-crypted devices (contrary to the way suggested on Btrfs FAQ - in my experience that can actually speed up things, especially on computers without hardware crypto, because the kernel has only one kcryptd thread per dm-crypt device and therefore only can use a single core per device. Plus I get the benefits of btrfs "smart raid".). So, try 6. The three first lines are from me starting btrfs-debug-tree again just as a quick way to verify that my netconsole setup was working. The first checksum failure came pretty much instantly after starting scrub: ------------------------------------------------------------ [ 168.207151] device fsid f26f08b1-89e6-4f2d-b7f9-857010dd2517 devid 1 transid 151087 /dev/dm-5 [ 168.208541] device fsid f26f08b1-89e6-4f2d-b7f9-857010dd2517 devid 2 transid 151087 /dev/dm-6 [ 168.284668] device fsid ad75beab-b52b-4f63-9d14-956cc8e95fc7 devid 1 transid 5001 /dev/dm-9 [ 188.507789] btrfs: dm-6 checksum verify failed on 2237998628864 wanted C3D64F found DCBF0869 level 88 [ 188.507852] btrfs: node seems invalid now. checksum ok = 1 [ 262.706585] btrfs: dm-6 checksum verify failed on 2246859587584 wanted 3D013CF8 found AD06874A level 77 [ 262.706650] btrfs: node seems invalid now. checksum ok = 1 [ 262.706822] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2246859587584, level 77, generation 93067543380099401, nritems 21766576. len param 17. debug 2/989/21766576/93067543380099401 [ 262.706929] ------------[ cut here ]------------ [ 262.706992] WARNING: at fs/btrfs/extent_io.c:4529 read_extent_buffer+0x167/0x1a0 [btrfs]() [ 262.707051] Hardware name: System Product Name [ 262.707078] Modules linked in: <omitted> [last unloaded: netconsole] [ 262.707126] Pid: 1212, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug #1 [ 262.707169] Call Trace: [ 262.707193] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [ 262.707229] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 [ 262.707282] [<ffffffffa01d8fc7>] read_extent_buffer+0x167/0x1a0 [btrfs] [ 262.707339] [<ffffffffa01d01fd>] btrfs_node_key+0x1d/0x20 [btrfs] [ 262.707391] [<ffffffffa02067af>] __readahead_hook+0x44f/0x500 [btrfs] [ 262.707444] [<ffffffffa0206b78>] btree_readahead_hook+0x18/0x40 [btrfs] [ 262.707497] [<ffffffffa01ac9d1>] btree_readpage_end_io_hook+0x111/0x270 [btrfs] [ 262.707556] [<ffffffffa01d4602>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 262.707613] [<ffffffffa01d524b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 262.707668] [<ffffffffa01abe61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 262.707708] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 262.707753] [<ffffffffa01abe6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 262.707805] [<ffffffffa01e2a97>] worker_loop+0x157/0x560 [btrfs] [ 262.707856] [<ffffffffa01e2940>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 262.707898] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 262.707930] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 262.707966] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 262.708003] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 262.708034] ---[ end trace e93713a9d40cd06e ]--- [ 262.708072] general protection fault: 0000 [#1] SMP [ 262.708112] CPU 1 [ 262.708126] Modules linked in: <omitted> [last unloaded: netconsole] [ 262.708176] [ 262.708190] Pid: 1212, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug #1 System manufacturer System Product Name/P8P67 EVO [ 262.708268] RIP: 0010:[<ffffffff811e83ed>] [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 262.708316] RSP: 0018:ffff8801fe19bb68 EFLAGS: 00010202 [ 262.708347] RAX: ffff8801fe19bc8f RBX: 0000000000000011 RCX: 0000000000000002 [ 262.708387] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff8801fe19bc8f [ 262.708426] RBP: ffff8801fe19bbe0 R08: 0000000000000000 R09: 0000000000000000 [ 262.708466] R10: ffff8801fe19bc8f R11: ffff8801f3184780 R12: ffff8801fe19bca0 [ 262.708505] R13: ffff88020fc0ca48 R14: 0000000000000048 R15: 0000000000000011 [ 262.708545] FS: 0000000000000000(0000) GS:ffff88021ec40000(0000) knlGS:0000000000000000 [ 262.708590] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.708622] CR2: ffffffffff600400 CR3: 000000000180c000 CR4: 00000000000407e0 [ 262.708661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 262.708701] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 262.708741] Process btrfs-endio-met (pid: 1212, threadinfo ffff8801fe19a000, task ffff88020e07c2f0) [ 262.708790] Stack: [ 262.708804] ffffffffa01d8f1b ffff8802014c21b0 0000000000000011 0000000000000002 [ 262.708864] 00000000000003dd 00000000014c21b0 014aa470074a0149 ffff88020f95e000 [ 262.708925] ffff8801fe19bc8f 0000000000001000 ffff8801fe19bc58 00000000000003dd [ 262.708984] Call Trace: [ 262.709021] [<ffffffffa01d8f1b>] ? read_extent_buffer+0xbb/0x1a0 [btrfs] [ 262.709077] [<ffffffffa01d01fd>] btrfs_node_key+0x1d/0x20 [btrfs] [ 262.709128] [<ffffffffa02067af>] __readahead_hook+0x44f/0x500 [btrfs] [ 262.709179] [<ffffffffa0206b78>] btree_readahead_hook+0x18/0x40 [btrfs] [ 262.709233] [<ffffffffa01ac9d1>] btree_readpage_end_io_hook+0x111/0x270 [btrfs] [ 262.709291] [<ffffffffa01d4602>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 262.709349] [<ffffffffa01d524b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 262.709404] [<ffffffffa01abe61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 262.709443] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 262.709487] [<ffffffffa01abe6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 262.709539] [<ffffffffa01e2a97>] worker_loop+0x157/0x560 [btrfs] [ 262.709590] [<ffffffffa01e2940>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 262.709631] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 262.709662] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 262.709698] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 262.709735] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 262.709765] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [ 262.710178] RIP [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 262.712431] RSP <ffff8801fe19bb68> [ 262.714672] ---[ end trace e93713a9d40cd06f ]--- [ 334.817046] btrfs: dm-6 checksum verify failed on 2239529385984 wanted 206D6165 found C4529DEC level 10 [ 334.818677] btrfs: node seems invalid now. checksum ok = 1 [ 334.820432] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239529385984, level 10, generation 7310033184097661728, nritems 996502889. len param 17. debug 2/989/996502889/7310033184097661728 [ 334.823857] ------------[ cut here ]------------ [ 334.825602] WARNING: at fs/btrfs/extent_io.c:4529 read_extent_buffer+0x167/0x1a0 [btrfs]() [ 334.827336] Hardware name: System Product Name [ 334.829049] Modules linked in: <omitted> [last unloaded: netconsole] [ 334.830814] Pid: 1132, comm: btrfs-endio-met Tainted: G D W 3.4.4+btrfsdebug #1 [ 334.832571] Call Trace: [ 334.834323] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 [ 334.836095] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 [ 334.837883] [<ffffffffa01d8fc7>] read_extent_buffer+0x167/0x1a0 [btrfs] [ 334.839658] [<ffffffffa01d01fd>] btrfs_node_key+0x1d/0x20 [btrfs] [ 334.841427] [<ffffffffa02067af>] __readahead_hook+0x44f/0x500 [btrfs] [ 334.843209] [<ffffffffa0206b78>] btree_readahead_hook+0x18/0x40 [btrfs] [ 334.844996] [<ffffffffa01ac9d1>] btree_readpage_end_io_hook+0x111/0x270 [btrfs] [ 334.846807] [<ffffffffa01d4602>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 334.848618] [<ffffffffa01d524b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 334.850448] [<ffffffffa01abe61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 334.852246] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 334.854051] [<ffffffffa01abe6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 334.855866] [<ffffffffa01e2a97>] worker_loop+0x157/0x560 [btrfs] [ 334.857674] [<ffffffffa01e2940>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 334.859471] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 334.861248] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 334.863039] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 334.864824] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 334.866604] ---[ end trace e93713a9d40cd070 ]--- [ 334.868388] general protection fault: 0000 [#2] SMP [ 334.870179] CPU 1 [ 334.870193] Modules linked in: <omitted> [last unloaded: netconsole] [ 334.873732] [ 334.875508] Pid: 1132, comm: btrfs-endio-met Tainted: G D W 3.4.4+btrfsdebug #1 System manufacturer System Product Name/P8P67 EVO [ 334.877390] RIP: 0010:[<ffffffff811e83ed>] [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 334.879272] RSP: 0018:ffff8801f2cbdb68 EFLAGS: 00010202 [ 334.881147] RAX: ffff8801f2cbdc8f RBX: 0000000000000011 RCX: 0000000000000002 [ 334.883023] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff8801f2cbdc8f [ 334.884885] RBP: ffff8801f2cbdbe0 R08: 0000000000000000 R09: 0000000000000000 [ 334.886768] R10: ffff8801f2cbdc8f R11: ffff8801f3184780 R12: ffff8801f2cbdca0 [ 334.888636] R13: ffff88020fcca468 R14: 0000000000000048 R15: 0000000000000011 [ 334.890521] FS: 0000000000000000(0000) GS:ffff88021ec40000(0000) knlGS:0000000000000000 [ 334.892409] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 334.894314] CR2: 00007f784a4fd5d8 CR3: 000000000180c000 CR4: 00000000000407e0 [ 334.896227] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 334.898169] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 334.900095] Process btrfs-endio-met (pid: 1132, threadinfo ffff8801f2cbc000, task ffff8801f491d940) [ 334.902067] Stack: [ 334.904011] ffffffffa01d8f1b ffff88023b656d69 0000000000000011 0000000000000002 [ 334.906031] 00000000000003dd 000000003b656d69 6572747362757320 ffff88020d9a8000 [ 334.908045] ffff8801f2cbdc8f 0000000000001000 ffff8801f2cbdc58 00000000000003dd [ 334.910091] Call Trace: [ 334.912112] [<ffffffffa01d8f1b>] ? read_extent_buffer+0xbb/0x1a0 [btrfs] [ 334.914183] [<ffffffffa01d01fd>] btrfs_node_key+0x1d/0x20 [btrfs] [ 334.916243] [<ffffffffa02067af>] __readahead_hook+0x44f/0x500 [btrfs] [ 334.918312] [<ffffffffa0206b78>] btree_readahead_hook+0x18/0x40 [btrfs] [ 334.920376] [<ffffffffa01ac9d1>] btree_readpage_end_io_hook+0x111/0x270 [btrfs] [ 334.922471] [<ffffffffa01d4602>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] [ 334.924568] [<ffffffffa01d524b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [ 334.926667] [<ffffffffa01abe61>] ? end_workqueue_fn+0x31/0x50 [btrfs] [ 334.928729] [<ffffffff81158988>] bio_endio+0x18/0x30 [ 334.930798] [<ffffffffa01abe6c>] end_workqueue_fn+0x3c/0x50 [btrfs] [ 334.932878] [<ffffffffa01e2a97>] worker_loop+0x157/0x560 [btrfs] [ 334.934952] [<ffffffffa01e2940>] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 334.937009] [<ffffffff81058e5e>] kthread+0x8e/0xa0 [ 334.939053] [<ffffffff81419064>] kernel_thread_helper+0x4/0x10 [ 334.941105] [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 [ 334.943190] [<ffffffff81419060>] ? gs_change+0x13/0x13 [ 334.945270] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [ 334.950107] RIP [<ffffffff811e83ed>] memcpy+0xd/0x110 [ 334.952402] RSP <ffff8801f2cbdb68> [ 334.954732] ---[ end trace e93713a9d40cd071 ]--- ------------------------------------------------------------ Sami
Jan Schmidt
2012-Jul-06 10:42 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 06.07.2012 01:47, Sami Liedes wrote:> On Thu, Jul 05, 2012 at 03:41:33PM +0200, Jan Schmidt wrote: >> I''d like to see if you corrupted your trees on disk in a really strange >> manner (with matching checksums?), or if data comes from the disk intact >> and becomes damaged thereafter. >> >> Could you store the output of >> btrfs-debug-tree /dev/[whatever] >> before try number 5 and afterwards? It will be quite a lot if you''ve got >> a lot of files in there. Don''t send it anywhere right now, just store it >> away if possible. > > Ok, I have it stored. I mounted the fs read-only to do that just to be > safe, because when I did it on a live fs I got some (~15) "parent > transid verify failed" messages.You can drop that, your data on disk is fine. Furthermore, I''m getting the feeling that this isn''t really related to readahead, but we will find out later if that''s true.>> Can you double check that there''s nothing about corrected errors in your >> logs? Scrub will correct errors along the way and log that. So maybe >> we''ve only a few tries left to find the root cause. > > Nope, definitely nothing there. A few "unlinked orphans" messages at > mount time, but that''s all. I''m sure I would have captured any further > messages on my netconsole. Also nothing in any earlier logs that I can > see.Okay. At least that fits with the "everything is okay on disk" statement.> Try 6, with your new patch - as a new thing, got checksum errors, > including one very slightly before the crash and with the same "eb > start" as in the "invalid parameters" message. Crashed in a couple of > minutes. > > Now that I look at the the log below it actually has two crashes in > it, one of btrfs-endio-met pid 1212 and another one of btrfs-endio-met > pid some 72 seconds later. I didn''t notice that until after the fact. > > dm-6 (the device with checksum failures) is on a disk on which I don''t > have anything else besides this partition, so I suppose it could be > corrupting data without me having noticed. I know it used to work at > one time, but that doesn''t mean too much. Anyway, I''m sure I haven''t > seen any "checksum verify failed" messages until now. None in the logs > either. > > Ah, one thing I have forgotten to mention. I think I''ve been using > this filesystem for maybe a month now. It started as a single-device > filesystem, but I did restriping to raid-1 on June 12. dm-5 is the > original device, while dm-6 (the one with the checksum failures in try > 6) is the added one. I still have logs of the restriping and > everything after that; grep reveals no unusual lines containing > "btrfs" in all the logs since then, apart from the crashes. Besides > the oopses and lines that mention btrfs just because of the kernel > version 3.4.4+btrfsdebug, this is a summary of all the lines that > mention btrfs: > > btrfs: disk space caching is enabled > btrfs flagging fs with big metadata feature > btrfs: new size for /dev/mapper/rootvg-scratch_crypt is 751617179648 > * this device is not on the filesystem with problems - although I > haven''t tried scrubbing this filesystem > btrfs: unlinked 104 orphans (10 times with 4 different numbers) > > And during restriping > > btrfs: found 10008 extents (and similar) > btrfs: relocating block group 1019010351104 flags 1 (and similar) > > And two different warnings in the last few days, but I think this is > only because the debug patches changed the code: > > WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]() > WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]() > > Another thing that comes to mind that I might well be the only one > that uses btrfs with raid-1 on two separate dm-crypted devices > (contrary to the way suggested on Btrfs FAQ - in my experience that > can actually speed up things, especially on computers without hardware > crypto, because the kernel has only one kcryptd thread per dm-crypt > device and therefore only can use a single core per device. Plus I get > the benefits of btrfs "smart raid".). > > So, try 6. The three first lines are from me starting btrfs-debug-tree > again just as a quick way to verify that my netconsole setup was > working. The first checksum failure came pretty much instantly after > starting scrub: > > ------------------------------------------------------------ > [ 168.207151] device fsid f26f08b1-89e6-4f2d-b7f9-857010dd2517 devid 1 transid 151087 /dev/dm-5 > [ 168.208541] device fsid f26f08b1-89e6-4f2d-b7f9-857010dd2517 devid 2 transid 151087 /dev/dm-6 > [ 168.284668] device fsid ad75beab-b52b-4f63-9d14-956cc8e95fc7 devid 1 transid 5001 /dev/dm-9 > [ 188.507789] btrfs: dm-6 checksum verify failed on 2237998628864 wanted C3D64F found DCBF0869 level 88 > [ 188.507852] btrfs: node seems invalid now. checksum ok = 1 > [ 262.706585] btrfs: dm-6 checksum verify failed on 2246859587584 wanted 3D013CF8 found AD06874A level 77 > [ 262.706650] btrfs: node seems invalid now. checksum ok = 1Here it''s getting weird now. The checksums were fine when read from disk, which was checked ...> [ 262.706822] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2246859587584, level 77, generation 93067543380099401, nritems 21766576. len param 17. debug 2/989/21766576/93067543380099401 > [ 262.706929] ------------[ cut here ]------------ > [ 262.706992] WARNING: at fs/btrfs/extent_io.c:4529 read_extent_buffer+0x167/0x1a0 [btrfs]() > [ 262.707051] Hardware name: System Product Name > [ 262.707078] Modules linked in: <omitted> [last unloaded: netconsole] > [ 262.707126] Pid: 1212, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug #1 > [ 262.707169] Call Trace: > [ 262.707193] [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0 > [ 262.707229] [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20 > [ 262.707282] [<ffffffffa01d8fc7>] read_extent_buffer+0x167/0x1a0 [btrfs] > [ 262.707339] [<ffffffffa01d01fd>] btrfs_node_key+0x1d/0x20 [btrfs] > [ 262.707391] [<ffffffffa02067af>] __readahead_hook+0x44f/0x500 [btrfs] > [ 262.707444] [<ffffffffa0206b78>] btree_readahead_hook+0x18/0x40 [btrfs] > [ 262.707497] [<ffffffffa01ac9d1>] btree_readpage_end_io_hook+0x111/0x270 [btrfs]^^^^^^^^^^^^^^^^^^^^^^^^^^ ... down here in the stack. The warning is printed from two levels above, __readahead_hook. Either I''m absolutely blind and there''s code along the (rather short) road between those two that might do this I haven''t seen. Or someone else messes with our extent buffers or the underlying pages. What really confuses me is that it happens so reproducibly. I''ve no good idea at the moment how to go on. It might help to get a feeling if it''s shifting around at least a little bit or really constant in the timing of occurrence. So can you please apply the next patch on top of the other two and give it some more failure tries? The "checksum mismatch [1234]" line will be of most interest. I''m also curious what the additional debug variables will say in the extended version of the very first printk. You can leave out the stack traces if you like, they won''t matter much anyway. Thanks, -Jan diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 34122c2..df0b347 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -550,6 +550,7 @@ static noinline int check_node(struct btrfs_root *root, u32 nritems = btrfs_header_nritems(node); u64 generation; + node->debug[4] = 0xb77f50b77f5; if (nritems == 0) return 0; @@ -575,6 +576,10 @@ static noinline int check_node(struct btrfs_root *root, return -EIO; } } + node->debug[5] = node->start; + node->debug[6] = btrfs_header_level(node); + node->debug[6] |= btrfs_header_level(root->node) << 16; + node->debug[7] = 0xb22f50b22f5; return 0; } @@ -686,10 +691,17 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, ret = -EIO; } + if (btrfs_csum_tree_block(root, eb)) + printk(KERN_ERR "btrfs: checksum mismatch 1 on %llu\n", + eb->start); + if (!ret) set_extent_buffer_uptodate(eb); err: if (test_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags)) { + if (btrfs_csum_tree_block(root, eb)) + printk(KERN_ERR "btrfs: checksum mismatch 2 on %llu\n", + eb->start); clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags); btree_readahead_hook(root, eb, eb->start, ret); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 099ce6e..7452ecb 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4521,11 +4521,12 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, unsigned long i = (start_offset + start) >> PAGE_CACHE_SHIFT; if (start > eb->len) { - printk(KERN_ERR "btrfs: invalid parameters for read_extent_buffer: start (%lu) > eb->len (%lu). eb start is %llu, level %d, generation %llu, nritems %d. len param %lu. debug %llu/%llu/%llu/%llu\n", + printk(KERN_ERR "btrfs: invalid parameters for read_extent_buffer: start (%lu) > eb->len (%lu). eb start is %llu, level %d, generation %llu, nritems %d. len param %lu. debug %llu/%llu/%llu/%llu/%#llx/%llu/%#llx/%#llx\n", start, eb->len, eb->start, btrfs_header_level(eb), btrfs_header_generation(eb), btrfs_header_nritems(eb), len, - eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3]); + eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3], + eb->debug[4], eb->debug[5], eb->debug[6], eb->debug[7]); WARN_ON(1); } WARN_ON(start + len > eb->start + eb->len); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 1bbf823..51c42f1 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -165,7 +165,7 @@ struct extent_buffer { struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES]; struct page **pages; - u64 debug[4]; + u64 debug[8]; }; static inline void extent_set_compress_type(unsigned long *bio_flags, diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index b659c8d..ea81bd4 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -130,6 +130,10 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, kref_get(&re->refcnt); spin_unlock(&fs_info->reada_lock); + if (!err && btrfs_csum_tree_block(root, eb)) + printk(KERN_ERR "btrfs: checksum mismatch 4 on %llu\n", + eb->start); + if (!re) return -1; @@ -248,6 +252,9 @@ int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, { int ret; + if (!err && btrfs_csum_tree_block(root, eb)) + printk(KERN_ERR "btrfs: checksum mismatch 3 on %llu\n", + eb->start); ret = __readahead_hook(root, eb, start, err); reada_start_machine(root->fs_info); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Jul-06 11:50 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, Jul 06, 2012 at 04:42:50AM -0600, Jan Schmidt wrote:> ... down here in the stack. The warning is printed from two levels above, > __readahead_hook. > > Either I''m absolutely blind and there''s code along the (rather short) road > between those two that might do this I haven''t seen. Or someone else messes > with our extent buffers or the underlying pages. What really confuses me is > that it happens so reproducibly. > > I''ve no good idea at the moment how to go on. It might help to get a feeling if > it''s shifting around at least a little bit or really constant in the timing of > occurrence. So can you please apply the next patch on top of the other two and > give it some more failure tries? The "checksum mismatch [1234]" line will be of > most interest. I''m also curious what the additional debug variables will say in > the extended version of the very first printk. You can leave out the stack > traces if you like, they won''t matter much anyway.I would suggest turning on slab debug and CONFIG_DEBUG_PAGEALLOC. Something really strange is happening here. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-06 14:33 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote:> I''ve no good idea at the moment how to go on. It might help to get a feeling if > it''s shifting around at least a little bit or really constant in the timing of > occurrence. So can you please apply the next patch on top of the other two and > give it some more failure tries? The "checksum mismatch [1234]" line will be of > most interest. I''m also curious what the additional debug variables will say in > the extended version of the very first printk. You can leave out the stack > traces if you like, they won''t matter much anyway.Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as suggested by Chris Mason. With those and the latest patch, there''s an oops already at boot. I don''t have netconsole yet at that point, but here''s the important parts (sure I can capture it fully if you need). By the way, something seems to be untabifying your patches. I don''t know if it''s on my side or yours, but at least some other patches I receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs mostly makes them apply cleanly for me. Sami ------------------------------------------------------------ btrfs: disk space caching is enabled BUG: unable to handle kernel NULL pointer dereference at 0000000000000150 IP: [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs] PGD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 6 Modules linked in: <omitted> [last unloaded: scsi_wait_scan] Pid: 1176, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug2 #2 System Product Name/P8P67 EVO RIP: 0010:[<ffffffffa0223568>] [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs [...] Process btrfs-endio-met (pid: 1176, [...]) Call trace: [...] btree_readpage_end_io_hook+0x1e5/0x2d0 [btrfs] [...] end_bio_extent_readpage+0xcb/0xa30 [btrfs] [...] ? end_workqueue_fn+0x31/0x50 [btrfs] [...] bio_endio+0x18/0x30 [...] end_workqueue_fn+0x3c/0x50 [btrfs] [...] worker_loop+0x157/0x560 [btrfs] [...] ? btrfs_queue_worker+0x310/0x310 [btrfs] [...] kthead+0x8e/0xa0 [...] kernel_thread_helper+0x4/0x10 [...] ? flush_kthread_worker+0x70/0x70 [...] ? gs_change+0x13/0x13 Code: [...] RIP [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs] RSP <ffff8801f3843cb0> ------------------------------------------------------------
Chris Mason
2012-Jul-06 14:40 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, Jul 06, 2012 at 08:33:51AM -0600, Sami Liedes wrote:> On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote: > > I''ve no good idea at the moment how to go on. It might help to get a feeling if > > it''s shifting around at least a little bit or really constant in the timing of > > occurrence. So can you please apply the next patch on top of the other two and > > give it some more failure tries? The "checksum mismatch [1234]" line will be of > > most interest. I''m also curious what the additional debug variables will say in > > the extended version of the very first printk. You can leave out the stack > > traces if you like, they won''t matter much anyway. > > Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as > suggested by Chris Mason. > > With those and the latest patch, there''s an oops already at boot. I > don''t have netconsole yet at that point, but here''s the important > parts (sure I can capture it fully if you need). > > By the way, something seems to be untabifying your patches. I don''t > know if it''s on my side or yours, but at least some other patches I > receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs > mostly makes them apply cleanly for me. > > Sami > > > ------------------------------------------------------------ > btrfs: disk space caching is enabled > BUG: unable to handle kernel NULL pointer dereference at 0000000000000150 > IP: [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs]This isn''t from any of the new debugging. Can you please try it on an unpatched kernel? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jul-06 15:02 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, July 06, 2012 at 16:40 (+0200), Chris Mason wrote:> On Fri, Jul 06, 2012 at 08:33:51AM -0600, Sami Liedes wrote: >> On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote: >>> I''ve no good idea at the moment how to go on. It might help to get a feeling if >>> it''s shifting around at least a little bit or really constant in the timing of >>> occurrence. So can you please apply the next patch on top of the other two and >>> give it some more failure tries? The "checksum mismatch [1234]" line will be of >>> most interest. I''m also curious what the additional debug variables will say in >>> the extended version of the very first printk. You can leave out the stack >>> traces if you like, they won''t matter much anyway. >> >> Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as >> suggested by Chris Mason. >> >> With those and the latest patch, there''s an oops already at boot. I >> don''t have netconsole yet at that point, but here''s the important >> parts (sure I can capture it fully if you need). >> >> By the way, something seems to be untabifying your patches. I don''t >> know if it''s on my side or yours, but at least some other patches I >> receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs >> mostly makes them apply cleanly for me. >> >> Sami >> >> >> ------------------------------------------------------------ >> btrfs: disk space caching is enabled >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000150 >> IP: [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs] > > This isn''t from any of the new debugging. Can you please try it on an > unpatched kernel?You''re confusing that with check_leaf. I added check_node along the way, see my mail from Thu, July 05, 2012 at 15:41 (+0200). I''d really like to add something similar for the 3.6 series. Checking for the null pointer dereference. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jul-06 15:09 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, July 06, 2012 at 16:33 (+0200), Sami Liedes wrote:> On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote: >> I''ve no good idea at the moment how to go on. It might help to get a feeling if >> it''s shifting around at least a little bit or really constant in the timing of >> occurrence. So can you please apply the next patch on top of the other two and >> give it some more failure tries? The "checksum mismatch [1234]" line will be of >> most interest. I''m also curious what the additional debug variables will say in >> the extended version of the very first printk. You can leave out the stack >> traces if you like, they won''t matter much anyway. > > Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as > suggested by Chris Mason. > > With those and the latest patch, there''s an oops already at boot. I > don''t have netconsole yet at that point, but here''s the important > parts (sure I can capture it fully if you need).Oh I see. root->node can be NULL during mount. Please add this on top: -- diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index df0b347..22838a3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -578,7 +578,8 @@ static noinline int check_node(struct btrfs_root *root, } node->debug[5] = node->start; node->debug[6] = btrfs_header_level(node); - node->debug[6] |= btrfs_header_level(root->node) << 16; + if (root->node) + node->debug[6] |= btrfs_header_level(root->node) << 16; node->debug[7] = 0xb22f50b22f5; return 0; --> By the way, something seems to be untabifying your patches. I don''t > know if it''s on my side or yours, but at least some other patches I > receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs > mostly makes them apply cleanly for me.Oh, I''m sorry. Should have been on my side. I hope it''s better with the current diff? -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Jul-06 15:19 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, Jul 06, 2012 at 09:02:34AM -0600, Jan Schmidt wrote:> On Fri, July 06, 2012 at 16:40 (+0200), Chris Mason wrote: > > On Fri, Jul 06, 2012 at 08:33:51AM -0600, Sami Liedes wrote: > >> On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote: > >>> I''ve no good idea at the moment how to go on. It might help to get a feeling if > >>> it''s shifting around at least a little bit or really constant in the timing of > >>> occurrence. So can you please apply the next patch on top of the other two and > >>> give it some more failure tries? The "checksum mismatch [1234]" line will be of > >>> most interest. I''m also curious what the additional debug variables will say in > >>> the extended version of the very first printk. You can leave out the stack > >>> traces if you like, they won''t matter much anyway. > >> > >> Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as > >> suggested by Chris Mason. > >> > >> With those and the latest patch, there''s an oops already at boot. I > >> don''t have netconsole yet at that point, but here''s the important > >> parts (sure I can capture it fully if you need). > >> > >> By the way, something seems to be untabifying your patches. I don''t > >> know if it''s on my side or yours, but at least some other patches I > >> receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs > >> mostly makes them apply cleanly for me. > >> > >> Sami > >> > >> > >> ------------------------------------------------------------ > >> btrfs: disk space caching is enabled > >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000150 > >> IP: [<ffffffffa0223568>] check_node+0x138/0x250 [btrfs] > > > > This isn''t from any of the new debugging. Can you please try it on an > > unpatched kernel? > > You''re confusing that with check_leaf. I added check_node along the way, see my > mail from Thu, July 05, 2012 at 15:41 (+0200). I''d really like to add something > similar for the 3.6 series. > > Checking for the null pointer dereference.Sorry, I wasn''t clear. I meant it wasn''t from slab debug or DEBUG_PAGEALLOC, so it must be new in your patches ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-06 21:41 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Fri, Jul 06, 2012 at 10:59:24PM +0300, Sami Liedes wrote:> I think I might try running it overnight with KMEMCHECK to see if it > reports something. But for now, what there''s in the log:My KMEMCHECK kernel didn''t even boot (due to some weird KMEMCHECK/ACPI interaction), so I won''t pursue this idea further at the moment...> * lots of checksum mismatch [234], no 1sOne thing to notice from the logs, too, is that the device seems to always be dm-6, the second device of the filesystem. This never seems to happen to dm-5. There are 1583 lines of "btrfs: dm-6 checksum verify failed". Sami
Sami Liedes
2012-Jul-06 23:44 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
[Retry: I think this mail didn''t make it to the list, probably because of the 73 kilobyte attached log. Here''s a URL to the file:] http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz Sami ------------------------------------------------------------ On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote:> Oh I see. root->node can be NULL during mount. Please add this on top:Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time it took half an hour to crash, and there''s _lots_ of checksum mismatch [234] messages even before the crash. gzipped dmesg attached. At 781 seconds there''s an "irq 17: nobody cared". That''s a known bug with this (and other Asus) motherboards and happens every now and then. I doubt it has anything to do with this. I think I might try running it overnight with KMEMCHECK to see if it reports something. But for now, what there''s in the log: * lots of checksum mismatch [234], no 1s * a fair number of "csum_tree_block: [0-9]+ callbacks suppressed" lines * two "btrfs: node seems invalid now. checksum ok = 1" messages, one at 1499 seconds and another just before the crash at 1973 * Just before the crash: btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 * the oopses> > By the way, something seems to be untabifying your patches. I don''t > > know if it''s on my side or yours, but at least some other patches I > > receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs > > mostly makes them apply cleanly for me. > > Oh, I''m sorry. Should have been on my side. I hope it''s better with the current > diff?Yes. No problem :) [See attachment for dmesg log.] Sami -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2012-Jul-09 09:05 UTC
Re: *** GMX Spamverdacht *** Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 07.07.2012 01:44, Sami Liedes wrote:> [Retry: I think this mail didn''t make it to the list, probably because > of the 73 kilobyte attached log. Here''s a URL to the file:] > > http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz > > Sami > > > ------------------------------------------------------------ > On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote: >> Oh I see. root->node can be NULL during mount. Please add this on top: > > Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time > it took half an hour to crash, and there''s _lots_ of checksum mismatch > [234] messages even before the crash. gzipped dmesg attached. > > At 781 seconds there''s an "irq 17: nobody cared". That''s a known bug > with this (and other Asus) motherboards and happens every now and > then. I doubt it has anything to do with this. > > I think I might try running it overnight with KMEMCHECK to see if it > reports something. But for now, what there''s in the log: > > * lots of checksum mismatch [234], no 1s > > * a fair number of "csum_tree_block: [0-9]+ callbacks suppressed" > lines > > * two "btrfs: node seems invalid now. checksum ok = 1" messages, one > at 1499 seconds and another just before the crash at 1973 > > * Just before the crash: > btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 >At a first glance: the generation converted to ascii is: "ent() ==", so someone is patching the memory with ascii text, possibly C source. It might be interesting to dump the full contents of the eb, to get a clue on the source of the data.> * the oopses > >>> By the way, something seems to be untabifying your patches. I don''t >>> know if it''s on my side or yours, but at least some other patches I >>> receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs >>> mostly makes them apply cleanly for me. >> >> Oh, I''m sorry. Should have been on my side. I hope it''s better with the current >> diff? > > Yes. No problem :) > > [See attachment for dmesg log.] > > Sami > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-10 04:16 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote:> > * Just before the crash: > > btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 > > > > At a first glance: the generation converted to ascii is: "ent() ==", > so someone is patching the memory with ascii text, possibly C source. > It might be interesting to dump the full contents of the eb, to get > a clue on the source of the data.I changed the code to dump the contents of the eb struct at the point where that error (btrfs: invalid parameters...) is printed, at the "checksum mismatch 4" site and at the "node seems invalid now" site. Now I have a big log of 1795 corrupted ebs. So far nothing that looks remotely like ascii text, though. But I have two different versions of the eb that caused that warning, a less corrupted one and a more corrupted one: ------------------------------------------------------------ btrfs: --- start eb contents at ffff8801b13cc4c8 --- btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ btrfs: --- end eb contents at ffff8801b13cc4c8 --- btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 3579FB59 level 160 btrfs: node seems invalid now. checksum ok = 1 btrfs: --- start eb contents at ffff8801b13cc4c8 --- [... identical dump to above ...] btrfs: --- end eb contents at ffff8801b13cc4c8 --- btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239404212224, level 160, generation 4716553384049587249, nritems 295705211. len param 17. debug 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0 btrfs: --- start eb contents at ffff8801b13cc4c8 --- btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ btrfs: --- end eb contents at ffff8801b13cc4c8 --- ------------[ cut here ]------------ WARNING: at fs/btrfs/extent_io.c:4533 read_extent_buffer+0x1ef/0x220 [btrfs]() Hardware name: System Product Name Modules linked in: <omitted> [last unloaded: scsi_wait_scan] Pid: 25829, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug3+ #4 [...] ------------------------------------------------------------ Here''s the difference (lines reordered from diff to make comparing easier): ------------------------------------------------------------ -btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. +btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. -btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... -btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ ------------------------------------------------------------ If there''s one pattern that catches eye in the dumps, it''s that in many places where the same eb is dumped multiple times due to multiple "checksum mismatch 4"s, there are bytes at offsets 0x30 and 0x31 that always seem to have the same value and both separately increase between the dumps, usually(?) by two. Then that might be normal, I haven''t looked at what should be at a struct eb :) There are some instances where they wrap, for example f8 f8 -> fa fa -> fc fc -> fe fe -> 00 00 -> 02 02 -> ..., with no other changes visible in the corrupted ebs. But that is by no means the only change in the struct eb contents that can be observed, only the most obvious pattern. For example, here''s a 11-long string of eb contents. First the dmesg: ------------------------------------------------------------ btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 1] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 2] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 3] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 4] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 5] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 6] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 7] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 8] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 9] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 10] btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 btrfs: checksum mismatch 4 on 1127200522240 [DUMP 11] ------------------------------------------------------------ Where DUMP 1 contents (file cc1.txt if you''d want to apply the diffs below) is ------------------------------------------------------------ btrfs: --- start eb contents at ffff8801cafbacc0 --- btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... btrfs: ffff8801cafbad30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad40: 00 00 10 00 00 00 00 00 04 04 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad50: 50 ad fb ca 01 88 ff ff 50 ad fb ca 01 88 ff ff P.......P....... btrfs: ffff8801cafbad60: 02 02 00 00 00 00 00 00 68 ad fb ca 01 88 ff ff ........h....... btrfs: ffff8801cafbad70: 68 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 h............... btrfs: ffff8801cafbad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad90: 00 a2 20 08 00 ea ff ff c0 2a f4 07 00 ea ff ff .. ......*...... btrfs: ffff8801cafbada0: 00 ac 89 07 00 ea ff ff c0 42 79 07 00 ea ff ff .........By..... btrfs: ffff8801cafbadb0: 80 af 77 07 00 ea ff ff 40 37 49 08 00 ea ff ff ..w.....@7I..... btrfs: ffff8801cafbadc0: 80 6d f3 07 00 ea ff ff c0 a7 e8 07 00 ea ff ff .m.............. btrfs: ffff8801cafbadd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ btrfs: --- end eb contents at ffff8801cafbacc0 --- ------------------------------------------------------------ And the successive diffs are ------------------------------------------------------------ --- cc1.txt 2012-07-10 06:57:21.564665577 +0300 +++ cc2.txt 2012-07-10 06:59:10.272634578 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... @@ -20,9 +20,9 @@ btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ -btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ -btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ -btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 01 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbae20: 4d 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 M.......N....... +btrfs: ffff8801cafbae30: 07 78 02 00 00 00 00 00 00 00 00 00 00 00 00 00 .x.............. btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ btrfs: --- end eb contents at ffff8801cafbacc0 --- --- cc2.txt 2012-07-10 06:59:10.272634578 +0300 +++ cc3.txt 2012-07-10 06:59:10.016634663 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc3.txt 2012-07-10 06:59:10.016634663 +0300 +++ cc4.txt 2012-07-10 06:59:09.752634749 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc4.txt 2012-07-10 06:59:09.752634749 +0300 +++ cc5.txt 2012-07-10 06:59:09.504634831 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc5.txt 2012-07-10 06:59:09.504634831 +0300 +++ cc6.txt 2012-07-10 06:59:09.240634917 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc6.txt 2012-07-10 06:59:09.240634917 +0300 +++ cc7.txt 2012-07-10 06:59:08.944635015 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc7.txt 2012-07-10 06:59:08.944635015 +0300 +++ cc8.txt 2012-07-10 06:59:08.656635109 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ +btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc8.txt 2012-07-10 06:59:08.656635109 +0300 +++ cc9.txt 2012-07-10 06:59:08.328635216 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. +btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc9.txt 2012-07-10 06:59:08.328635216 +0300 +++ cc10.txt 2012-07-10 06:58:33.180646115 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. +btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... --- cc10.txt 2012-07-10 06:58:33.180646115 +0300 +++ cc11.txt 2012-07-10 06:58:41.340643696 +0300 @@ -2,7 +2,7 @@ btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... -btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. +btrfs: ffff8801cafbacf0: 27 27 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ''''.............. btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... ------------------------------------------------------------ And since it''s of course possible there''s something wrong with my modifications, here''s the patch I used: ------------------------------------------------------------ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7452ecb..aadb82c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4527,6 +4527,9 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, len, eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3], eb->debug[4], eb->debug[5], eb->debug[6], eb->debug[7]); + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); WARN_ON(1); } WARN_ON(start + len > eb->start + eb->len); diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index ea81bd4..663d6c4 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -130,9 +130,13 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, kref_get(&re->refcnt); spin_unlock(&fs_info->reada_lock); - if (!err && btrfs_csum_tree_block(root, eb)) + if (!err && btrfs_csum_tree_block(root, eb)) { printk(KERN_ERR "btrfs: checksum mismatch 4 on %llu\n", eb->start); + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); + } if (!re) return -1; @@ -150,9 +154,14 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, if (err == 0) { nritems = level ? btrfs_header_nritems(eb) : 0; if (level > BTRFS_MAX_LEVEL || - nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) + nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) { printk(KERN_ERR "btrfs: node seems invalid now. checksum ok = %d\n", btrfs_csum_tree_block(root, eb)); + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); + + } generation = btrfs_header_generation(eb); /* * FIXME: currently we just set nritems to 0 if this is a leaf, ------------------------------------------------------------ Sami
Arne Jansen
2012-Jul-10 06:05 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
Thanks for doing this, I''ll start looking into it right away. One thing though: it''s probably not the contents of struct eb that''s being corrupted, but the data the eb points to. See for example read_extent_buffer to see how to access it. Sorry for being unclear on this. Thanks, Arne On 10.07.2012 06:16, Sami Liedes wrote:> On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote: >>> * Just before the crash: >>> btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 >>> >> >> At a first glance: the generation converted to ascii is: "ent() ==", >> so someone is patching the memory with ascii text, possibly C source. >> It might be interesting to dump the full contents of the eb, to get >> a clue on the source of the data. > > I changed the code to dump the contents of the eb struct at the point > where that error (btrfs: invalid parameters...) is printed, at the > "checksum mismatch 4" site and at the "node seems invalid now" site. > Now I have a big log of 1795 corrupted ebs. So far nothing that looks > remotely like ascii text, though. But I have two different versions of > the eb that caused that warning, a less corrupted one and a more > corrupted one: > > ------------------------------------------------------------ > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ > btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... > btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... > btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... > btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. > btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. > btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... > btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... > btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... > btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. > btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801b13cc4c8 --- > btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 3579FB59 level 160 > btrfs: node seems invalid now. checksum ok = 1 > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > [... identical dump to above ...] > btrfs: --- end eb contents at ffff8801b13cc4c8 --- > btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239404212224, level 160, generation 4716553384049587249, nritems 295705211. len param 17. debug 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0 > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ > btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... > btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... > btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... > btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. > btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. > btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... > btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... > btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... > btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. > btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... > btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ > btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801b13cc4c8 --- > ------------[ cut here ]------------ > WARNING: at fs/btrfs/extent_io.c:4533 read_extent_buffer+0x1ef/0x220 [btrfs]() > Hardware name: System Product Name > Modules linked in: <omitted> [last unloaded: scsi_wait_scan] > Pid: 25829, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug3+ #4 > [...] > ------------------------------------------------------------ > > Here''s the difference (lines reordered from diff to make comparing > easier): > > ------------------------------------------------------------ > -btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. > +btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. > -btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... > -btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ > ------------------------------------------------------------ > > If there''s one pattern that catches eye in the dumps, it''s that in > many places where the same eb is dumped multiple times due to multiple > "checksum mismatch 4"s, there are bytes at offsets 0x30 and 0x31 that > always seem to have the same value and both separately increase > between the dumps, usually(?) by two. Then that might be normal, I > haven''t looked at what should be at a struct eb :) There are some > instances where they wrap, for example > > f8 f8 -> fa fa -> fc fc -> fe fe -> 00 00 -> 02 02 -> ..., > > with no other changes visible in the corrupted ebs. But that is by no > means the only change in the struct eb contents that can be observed, > only the most obvious pattern. > > For example, here''s a 11-long string of eb contents. First the dmesg: > > ------------------------------------------------------------ > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 1] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 2] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 3] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 4] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 5] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 6] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 7] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 8] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 9] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 10] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 11] > ------------------------------------------------------------ > > Where DUMP 1 contents (file cc1.txt if you''d want to apply the diffs > below) is > > ------------------------------------------------------------ > btrfs: --- start eb contents at ffff8801cafbacc0 --- > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > btrfs: ffff8801cafbad30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad40: 00 00 10 00 00 00 00 00 04 04 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad50: 50 ad fb ca 01 88 ff ff 50 ad fb ca 01 88 ff ff P.......P....... > btrfs: ffff8801cafbad60: 02 02 00 00 00 00 00 00 68 ad fb ca 01 88 ff ff ........h....... > btrfs: ffff8801cafbad70: 68 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 h............... > btrfs: ffff8801cafbad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad90: 00 a2 20 08 00 ea ff ff c0 2a f4 07 00 ea ff ff .. ......*...... > btrfs: ffff8801cafbada0: 00 ac 89 07 00 ea ff ff c0 42 79 07 00 ea ff ff .........By..... > btrfs: ffff8801cafbadb0: 80 af 77 07 00 ea ff ff 40 37 49 08 00 ea ff ff ..w.....@7I..... > btrfs: ffff8801cafbadc0: 80 6d f3 07 00 ea ff ff c0 a7 e8 07 00 ea ff ff .m.............. > btrfs: ffff8801cafbadd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801cafbacc0 --- > ------------------------------------------------------------ > > And the successive diffs are > > ------------------------------------------------------------ > --- cc1.txt 2012-07-10 06:57:21.564665577 +0300 > +++ cc2.txt 2012-07-10 06:59:10.272634578 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > @@ -20,9 +20,9 @@ > btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 01 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbae20: 4d 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 M.......N....... > +btrfs: ffff8801cafbae30: 07 78 02 00 00 00 00 00 00 00 00 00 00 00 00 00 .x.............. > btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801cafbacc0 --- > --- cc2.txt 2012-07-10 06:59:10.272634578 +0300 > +++ cc3.txt 2012-07-10 06:59:10.016634663 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc3.txt 2012-07-10 06:59:10.016634663 +0300 > +++ cc4.txt 2012-07-10 06:59:09.752634749 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc4.txt 2012-07-10 06:59:09.752634749 +0300 > +++ cc5.txt 2012-07-10 06:59:09.504634831 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc5.txt 2012-07-10 06:59:09.504634831 +0300 > +++ cc6.txt 2012-07-10 06:59:09.240634917 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc6.txt 2012-07-10 06:59:09.240634917 +0300 > +++ cc7.txt 2012-07-10 06:59:08.944635015 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc7.txt 2012-07-10 06:59:08.944635015 +0300 > +++ cc8.txt 2012-07-10 06:59:08.656635109 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc8.txt 2012-07-10 06:59:08.656635109 +0300 > +++ cc9.txt 2012-07-10 06:59:08.328635216 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. > +btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc9.txt 2012-07-10 06:59:08.328635216 +0300 > +++ cc10.txt 2012-07-10 06:58:33.180646115 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. > +btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc10.txt 2012-07-10 06:58:33.180646115 +0300 > +++ cc11.txt 2012-07-10 06:58:41.340643696 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. > +btrfs: ffff8801cafbacf0: 27 27 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ''''.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > ------------------------------------------------------------ > > And since it''s of course possible there''s something wrong with my > modifications, here''s the patch I used: > > ------------------------------------------------------------ > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 7452ecb..aadb82c 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4527,6 +4527,9 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, > len, > eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3], > eb->debug[4], eb->debug[5], eb->debug[6], eb->debug[7]); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > WARN_ON(1); > } > WARN_ON(start + len > eb->start + eb->len); > diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c > index ea81bd4..663d6c4 100644 > --- a/fs/btrfs/reada.c > +++ b/fs/btrfs/reada.c > @@ -130,9 +130,13 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, > kref_get(&re->refcnt); > spin_unlock(&fs_info->reada_lock); > > - if (!err && btrfs_csum_tree_block(root, eb)) > + if (!err && btrfs_csum_tree_block(root, eb)) { > printk(KERN_ERR "btrfs: checksum mismatch 4 on %llu\n", > eb->start); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > + } > > if (!re) > return -1; > @@ -150,9 +154,14 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, > if (err == 0) { > nritems = level ? btrfs_header_nritems(eb) : 0; > if (level > BTRFS_MAX_LEVEL || > - nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) > + nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) { > printk(KERN_ERR "btrfs: node seems invalid now. checksum ok = %d\n", > btrfs_csum_tree_block(root, eb)); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > + > + } > generation = btrfs_header_generation(eb); > /* > * FIXME: currently we just set nritems to 0 if this is a leaf, > > ------------------------------------------------------------ > > Sami-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2012-Jul-10 06:57 UTC
Re: *** GMX Spamverdacht *** Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On 10.07.2012 06:16, Sami Liedes wrote:> On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote: >>> * Just before the crash: >>> btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 >>> >> >> At a first glance: the generation converted to ascii is: "ent() ==", >> so someone is patching the memory with ascii text, possibly C source. >> It might be interesting to dump the full contents of the eb, to get >> a clue on the source of the data. > > I changed the code to dump the contents of the eb struct at the point > where that error (btrfs: invalid parameters...) is printed, at the > "checksum mismatch 4" site and at the "node seems invalid now" site. > Now I have a big log of 1795 corrupted ebs. So far nothing that looks > remotely like ascii text, though. But I have two different versions of > the eb that caused that warning, a less corrupted one and a more > corrupted one: > > ------------------------------------------------------------ > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ > btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... > btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... > btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... > btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. > btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. > btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... > btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... > btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... > btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. > btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........this one looks good so far.> btrfs: --- end eb contents at ffff8801b13cc4c8 --- > btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 3579FB59 level 160 > btrfs: node seems invalid now. checksum ok = 1 > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > [... identical dump to above ...] > btrfs: --- end eb contents at ffff8801b13cc4c8 --- > btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239404212224, level 160, generation 4716553384049587249, nritems 295705211. len param 17. debug 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0 > btrfs: --- start eb contents at ffff8801b13cc4c8 --- > btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ > btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... > btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... > btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... > btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. > btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. > btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... > btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... > btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... > btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. > btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... > btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ > btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801b13cc4c8 --- > ------------[ cut here ]------------ > WARNING: at fs/btrfs/extent_io.c:4533 read_extent_buffer+0x1ef/0x220 [btrfs]() > Hardware name: System Product Name > Modules linked in: <omitted> [last unloaded: scsi_wait_scan] > Pid: 25829, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug3+ #4 > [...] > ------------------------------------------------------------ > > Here''s the difference (lines reordered from diff to make comparing > easier): > > ------------------------------------------------------------ > -btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. > +btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. > -btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... > -btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ > ------------------------------------------------------------The diff is just the eb->debug[] Jan added.> > If there''s one pattern that catches eye in the dumps, it''s that in > many places where the same eb is dumped multiple times due to multiple > "checksum mismatch 4"s, there are bytes at offsets 0x30 and 0x31 that > always seem to have the same value and both separately increase > between the dumps, usually(?) by two. Then that might be normal, I > haven''t looked at what should be at a struct eb :) There are some > instances where they wrap, for example > > f8 f8 -> fa fa -> fc fc -> fe fe -> 00 00 -> 02 02 -> ...,this is the spinlock to protect the refcnt, so some reference taking/releasing has been going on.> > with no other changes visible in the corrupted ebs. But that is by no > means the only change in the struct eb contents that can be observed, > only the most obvious pattern. > > For example, here''s a 11-long string of eb contents. First the dmesg: > > ------------------------------------------------------------ > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 1] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 2] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 3] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 4] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 5] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 6] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 7] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 8] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 9] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 10] > btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 > btrfs: checksum mismatch 4 on 1127200522240 > [DUMP 11] > ------------------------------------------------------------ > > Where DUMP 1 contents (file cc1.txt if you''d want to apply the diffs > below) is > > ------------------------------------------------------------ > btrfs: --- start eb contents at ffff8801cafbacc0 --- > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > btrfs: ffff8801cafbad30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad40: 00 00 10 00 00 00 00 00 04 04 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad50: 50 ad fb ca 01 88 ff ff 50 ad fb ca 01 88 ff ff P.......P....... > btrfs: ffff8801cafbad60: 02 02 00 00 00 00 00 00 68 ad fb ca 01 88 ff ff ........h....... > btrfs: ffff8801cafbad70: 68 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 h............... > btrfs: ffff8801cafbad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad90: 00 a2 20 08 00 ea ff ff c0 2a f4 07 00 ea ff ff .. ......*...... > btrfs: ffff8801cafbada0: 00 ac 89 07 00 ea ff ff c0 42 79 07 00 ea ff ff .........By..... > btrfs: ffff8801cafbadb0: 80 af 77 07 00 ea ff ff 40 37 49 08 00 ea ff ff ..w.....@7I..... > btrfs: ffff8801cafbadc0: 80 6d f3 07 00 ea ff ff c0 a7 e8 07 00 ea ff ff .m.............. > btrfs: ffff8801cafbadd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801cafbacc0 --- > ------------------------------------------------------------ > > And the successive diffs are > > ------------------------------------------------------------ > --- cc1.txt 2012-07-10 06:57:21.564665577 +0300 > +++ cc2.txt 2012-07-10 06:59:10.272634578 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................the spinlock> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > @@ -20,9 +20,9 @@ > btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > -btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 01 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbae20: 4d 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 M.......N....... > +btrfs: ffff8801cafbae30: 07 78 02 00 00 00 00 00 00 00 00 00 00 00 00 00 .x..............and eb->debug> btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ > btrfs: --- end eb contents at ffff8801cafbacc0 --- > --- cc2.txt 2012-07-10 06:59:10.272634578 +0300 > +++ cc3.txt 2012-07-10 06:59:10.016634663 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc3.txt 2012-07-10 06:59:10.016634663 +0300 > +++ cc4.txt 2012-07-10 06:59:09.752634749 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc4.txt 2012-07-10 06:59:09.752634749 +0300 > +++ cc5.txt 2012-07-10 06:59:09.504634831 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc5.txt 2012-07-10 06:59:09.504634831 +0300 > +++ cc6.txt 2012-07-10 06:59:09.240634917 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc6.txt 2012-07-10 06:59:09.240634917 +0300 > +++ cc7.txt 2012-07-10 06:59:08.944635015 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc7.txt 2012-07-10 06:59:08.944635015 +0300 > +++ cc8.txt 2012-07-10 06:59:08.656635109 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > +btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc8.txt 2012-07-10 06:59:08.656635109 +0300 > +++ cc9.txt 2012-07-10 06:59:08.328635216 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. > +btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc9.txt 2012-07-10 06:59:08.328635216 +0300 > +++ cc10.txt 2012-07-10 06:58:33.180646115 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. > +btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > --- cc10.txt 2012-07-10 06:58:33.180646115 +0300 > +++ cc11.txt 2012-07-10 06:58:41.340643696 +0300 > @@ -2,7 +2,7 @@ > btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ > btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... > -btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. > +btrfs: ffff8801cafbacf0: 27 27 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ''''.............. > btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... > ------------------------------------------------------------All spinlocks. I''ll go through this again with Jan to see if his eb->debug might give any clue here. It might be good to have a look at the contents of the pages though, as written in the previous mail. -Arne> > And since it''s of course possible there''s something wrong with my > modifications, here''s the patch I used: > > ------------------------------------------------------------ > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 7452ecb..aadb82c 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4527,6 +4527,9 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, > len, > eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3], > eb->debug[4], eb->debug[5], eb->debug[6], eb->debug[7]); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > WARN_ON(1); > } > WARN_ON(start + len > eb->start + eb->len); > diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c > index ea81bd4..663d6c4 100644 > --- a/fs/btrfs/reada.c > +++ b/fs/btrfs/reada.c > @@ -130,9 +130,13 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, > kref_get(&re->refcnt); > spin_unlock(&fs_info->reada_lock); > > - if (!err && btrfs_csum_tree_block(root, eb)) > + if (!err && btrfs_csum_tree_block(root, eb)) { > printk(KERN_ERR "btrfs: checksum mismatch 4 on %llu\n", > eb->start); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > + } > > if (!re) > return -1; > @@ -150,9 +154,14 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, > if (err == 0) { > nritems = level ? btrfs_header_nritems(eb) : 0; > if (level > BTRFS_MAX_LEVEL || > - nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) > + nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) { > printk(KERN_ERR "btrfs: node seems invalid now. checksum ok = %d\n", > btrfs_csum_tree_block(root, eb)); > + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); > + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); > + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); > + > + } > generation = btrfs_header_generation(eb); > /* > * FIXME: currently we just set nritems to 0 if this is a leaf, > > ------------------------------------------------------------ > > Sami-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2012-Jul-16 08:20 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
Any news on this? I you give me some hints, I can try to reproduce it here. -Arne On 10.07.2012 08:57, Arne Jansen wrote:> On 10.07.2012 06:16, Sami Liedes wrote: >> On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote: >>>> * Just before the crash: >>>> btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 >>>> >>> >>> At a first glance: the generation converted to ascii is: "ent() ==", >>> so someone is patching the memory with ascii text, possibly C source. >>> It might be interesting to dump the full contents of the eb, to get >>> a clue on the source of the data. >> >> I changed the code to dump the contents of the eb struct at the point >> where that error (btrfs: invalid parameters...) is printed, at the >> "checksum mismatch 4" site and at the "node seems invalid now" site. >> Now I have a big log of 1795 corrupted ebs. So far nothing that looks >> remotely like ascii text, though. But I have two different versions of >> the eb that caused that warning, a less corrupted one and a more >> corrupted one: >> >> ------------------------------------------------------------ >> btrfs: --- start eb contents at ffff8801b13cc4c8 --- >> btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ >> btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... >> btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... >> btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... >> btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. >> btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. >> btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... >> btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... >> btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... >> btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. >> btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ > > this one looks good so far. > >> btrfs: --- end eb contents at ffff8801b13cc4c8 --- >> btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 3579FB59 level 160 >> btrfs: node seems invalid now. checksum ok = 1 >> btrfs: --- start eb contents at ffff8801b13cc4c8 --- >> [... identical dump to above ...] >> btrfs: --- end eb contents at ffff8801b13cc4c8 --- >> btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2239404212224, level 160, generation 4716553384049587249, nritems 295705211. len param 17. debug 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0 >> btrfs: --- start eb contents at ffff8801b13cc4c8 --- >> btrfs: ffff8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00 ...f............ >> btrfs: ffff8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff .......0....... >> btrfs: ffff8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff X.<.....X.<..... >> btrfs: ffff8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff ........p.<..... >> btrfs: ffff8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 p.<............. >> btrfs: ffff8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff ._.............. >> btrfs: ffff8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff @L.......f...... >> btrfs: ffff8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff ........@....... >> btrfs: ffff8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff @............... >> btrfs: ffff8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. >> btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... >> btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ >> btrfs: ffff8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801b13cc658: 00 00 00 00 00 00 00 00 ........ >> btrfs: --- end eb contents at ffff8801b13cc4c8 --- >> ------------[ cut here ]------------ >> WARNING: at fs/btrfs/extent_io.c:4533 read_extent_buffer+0x1ef/0x220 [btrfs]() >> Hardware name: System Product Name >> Modules linked in: <omitted> [last unloaded: scsi_wait_scan] >> Pid: 25829, comm: btrfs-endio-met Tainted: G W 3.4.4+btrfsdebug3+ #4 >> [...] >> ------------------------------------------------------------ >> >> Here''s the difference (lines reordered from diff to make comparing >> easier): >> >> ------------------------------------------------------------ >> -btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00 ..<............. >> +btrfs: ffff8801b13cc618: 98 c5 3c b1 01 88 ff ff 02 00 00 00 00 00 00 00 ..<............. >> -btrfs: ffff8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801b13cc628: dd 03 00 00 00 00 00 00 7b 1a a0 11 00 00 00 00 ........{....... >> -btrfs: ffff8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801b13cc638: 31 34 71 3c 50 90 74 41 00 00 00 00 00 00 00 00 14q<P.tA........ >> ------------------------------------------------------------ > > The diff is just the eb->debug[] Jan added. > >> >> If there''s one pattern that catches eye in the dumps, it''s that in >> many places where the same eb is dumped multiple times due to multiple >> "checksum mismatch 4"s, there are bytes at offsets 0x30 and 0x31 that >> always seem to have the same value and both separately increase >> between the dumps, usually(?) by two. Then that might be normal, I >> haven''t looked at what should be at a struct eb :) There are some >> instances where they wrap, for example >> >> f8 f8 -> fa fa -> fc fc -> fe fe -> 00 00 -> 02 02 -> ..., > > this is the spinlock to protect the refcnt, so some reference taking/releasing > has been going on. > >> >> with no other changes visible in the corrupted ebs. But that is by no >> means the only change in the struct eb contents that can be observed, >> only the most obvious pattern. >> >> For example, here''s a 11-long string of eb contents. First the dmesg: >> >> ------------------------------------------------------------ >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 1] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 2] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 3] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 4] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 5] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 6] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 7] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 8] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 9] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 10] >> btrfs: dm-6 checksum verify failed on 1127200522240 wanted 7712C045 found C593E2D6 level 2 >> btrfs: checksum mismatch 4 on 1127200522240 >> [DUMP 11] >> ------------------------------------------------------------ >> >> Where DUMP 1 contents (file cc1.txt if you''d want to apply the diffs >> below) is >> >> ------------------------------------------------------------ >> btrfs: --- start eb contents at ffff8801cafbacc0 --- >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> btrfs: ffff8801cafbad30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad40: 00 00 10 00 00 00 00 00 04 04 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad50: 50 ad fb ca 01 88 ff ff 50 ad fb ca 01 88 ff ff P.......P....... >> btrfs: ffff8801cafbad60: 02 02 00 00 00 00 00 00 68 ad fb ca 01 88 ff ff ........h....... >> btrfs: ffff8801cafbad70: 68 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 h............... >> btrfs: ffff8801cafbad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad90: 00 a2 20 08 00 ea ff ff c0 2a f4 07 00 ea ff ff .. ......*...... >> btrfs: ffff8801cafbada0: 00 ac 89 07 00 ea ff ff c0 42 79 07 00 ea ff ff .........By..... >> btrfs: ffff8801cafbadb0: 80 af 77 07 00 ea ff ff 40 37 49 08 00 ea ff ff ..w.....@7I..... >> btrfs: ffff8801cafbadc0: 80 6d f3 07 00 ea ff ff c0 a7 e8 07 00 ea ff ff .m.............. >> btrfs: ffff8801cafbadd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ >> btrfs: --- end eb contents at ffff8801cafbacc0 --- >> ------------------------------------------------------------ >> >> And the successive diffs are >> >> ------------------------------------------------------------ >> --- cc1.txt 2012-07-10 06:57:21.564665577 +0300 >> +++ cc2.txt 2012-07-10 06:59:10.272634578 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 13 13 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ > > the spinlock > >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> @@ -20,9 +20,9 @@ >> btrfs: ffff8801cafbade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbadf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> -btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 00 00 00 00 00 00 00 00 ................ >> -btrfs: ffff8801cafbae20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> -btrfs: ffff8801cafbae30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbae10: 90 ad fb ca 01 88 ff ff 01 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbae20: 4d 00 00 00 00 00 00 00 4e 00 00 00 00 00 00 00 M.......N....... >> +btrfs: ffff8801cafbae30: 07 78 02 00 00 00 00 00 00 00 00 00 00 00 00 00 .x.............. > > and eb->debug > >> btrfs: ffff8801cafbae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbae50: 00 00 00 00 00 00 00 00 ........ >> btrfs: --- end eb contents at ffff8801cafbacc0 --- >> --- cc2.txt 2012-07-10 06:59:10.272634578 +0300 >> +++ cc3.txt 2012-07-10 06:59:10.016634663 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 15 15 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc3.txt 2012-07-10 06:59:10.016634663 +0300 >> +++ cc4.txt 2012-07-10 06:59:09.752634749 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 17 17 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc4.txt 2012-07-10 06:59:09.752634749 +0300 >> +++ cc5.txt 2012-07-10 06:59:09.504634831 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 19 19 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc5.txt 2012-07-10 06:59:09.504634831 +0300 >> +++ cc6.txt 2012-07-10 06:59:09.240634917 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 1b 1b 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc6.txt 2012-07-10 06:59:09.240634917 +0300 >> +++ cc7.txt 2012-07-10 06:59:08.944635015 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 1d 1d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc7.txt 2012-07-10 06:59:08.944635015 +0300 >> +++ cc8.txt 2012-07-10 06:59:08.656635109 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 1f 1f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ >> +btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc8.txt 2012-07-10 06:59:08.656635109 +0300 >> +++ cc9.txt 2012-07-10 06:59:08.328635216 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 21 21 00 00 03 00 00 00 00 00 00 00 00 00 00 00 !!.............. >> +btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc9.txt 2012-07-10 06:59:08.328635216 +0300 >> +++ cc10.txt 2012-07-10 06:58:33.180646115 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 23 23 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ##.............. >> +btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> --- cc10.txt 2012-07-10 06:58:33.180646115 +0300 >> +++ cc11.txt 2012-07-10 06:58:41.340643696 +0300 >> @@ -2,7 +2,7 @@ >> btrfs: ffff8801cafbacc0: 00 00 63 72 06 01 00 00 00 80 00 00 00 00 00 00 ..cr............ >> btrfs: ffff8801cafbacd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbace0: 35 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff 5.......0....... >> -btrfs: ffff8801cafbacf0: 25 25 00 00 03 00 00 00 00 00 00 00 00 00 00 00 %%.............. >> +btrfs: ffff8801cafbacf0: 27 27 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ''''.............. >> btrfs: ffff8801cafbad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> btrfs: ffff8801cafbad20: 22 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "............... >> ------------------------------------------------------------ > > All spinlocks. > > I''ll go through this again with Jan to see if his eb->debug might give > any clue here. It might be good to have a look at the contents of the > pages though, as written in the previous mail. > > -Arne > >> >> And since it''s of course possible there''s something wrong with my >> modifications, here''s the patch I used: >> >> ------------------------------------------------------------ >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index 7452ecb..aadb82c 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -4527,6 +4527,9 @@ void read_extent_buffer(struct extent_buffer *eb, void *dstv, >> len, >> eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3], >> eb->debug[4], eb->debug[5], eb->debug[6], eb->debug[7]); >> + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); >> + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); >> + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); >> WARN_ON(1); >> } >> WARN_ON(start + len > eb->start + eb->len); >> diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c >> index ea81bd4..663d6c4 100644 >> --- a/fs/btrfs/reada.c >> +++ b/fs/btrfs/reada.c >> @@ -130,9 +130,13 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, >> kref_get(&re->refcnt); >> spin_unlock(&fs_info->reada_lock); >> >> - if (!err && btrfs_csum_tree_block(root, eb)) >> + if (!err && btrfs_csum_tree_block(root, eb)) { >> printk(KERN_ERR "btrfs: checksum mismatch 4 on %llu\n", >> eb->start); >> + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); >> + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); >> + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); >> + } >> >> if (!re) >> return -1; >> @@ -150,9 +154,14 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, >> if (err == 0) { >> nritems = level ? btrfs_header_nritems(eb) : 0; >> if (level > BTRFS_MAX_LEVEL || >> - nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) >> + nritems > BTRFS_NODEPTRS_PER_BLOCK(root)) { >> printk(KERN_ERR "btrfs: node seems invalid now. checksum ok = %d\n", >> btrfs_csum_tree_block(root, eb)); >> + printk(KERN_ERR "btrfs: --- start eb contents at %p ---\n", eb); >> + print_hex_dump(KERN_ERR, "btrfs: ", DUMP_PREFIX_ADDRESS, 16, 1, eb, sizeof(*eb), true); >> + printk(KERN_ERR "btrfs: --- end eb contents at %p ---\n", eb); >> + >> + } >> generation = btrfs_header_generation(eb); >> /* >> * FIXME: currently we just set nritems to 0 if this is a leaf, >> >> ------------------------------------------------------------ >> >> Sami > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2012-Jul-16 21:29 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Mon, Jul 16, 2012 at 10:20:28AM +0200, Arne Jansen wrote:> Any news on this? I you give me some hints, I can try to reproduce > it here.I''ve been planning for a few days now to try if it''s reproducible in a virtual machine with the same filesystem images. However I haven''t gotten around to doing this yet... I tried to get KMEMCHECK working on my computer, but failed (and generated a bug report). It seems to work in KVM though. So, currently my idea is to boot the machine with a live USB stick, install kvm and make qemu qcow images backed by the real (2*1.1T) devices, but writing changes to the qcow images (I dare not mess with the actual devices, and don''t happen to have quite 2.2T extra space outside of them...), and try to run scrub there. If that succeeds and the bug happens there too, debugging *should* be easier, and it *should* be possible to run it under KMEMCHECK too. If the bug doesn''t happen inside a virtual machine, that would be interesting information too. Another idea might be to use LVM snapshots of the actual devices, but recent messages have made be wary of that approach - plus it''s somewhat of a pain, because the snapshotting would have to be done before mounting anyway, since I have two devices and I doubt LVM supports snapshotting two separate devices at the exact same moment. However, I''m a bit ill at the moment, so I assume it might be at least a few days until I get to actually implementing this... Sami
Sami Liedes
2012-Jul-28 12:08 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, Jul 17, 2012 at 12:29:33AM +0300, Sami Liedes wrote:> So, currently my idea is to boot the machine with a live USB stick, > install kvm and make qemu qcow images backed by the real (2*1.1T) > devices, but writing changes to the qcow images (I dare not mess with > the actual devices, and don''t happen to have quite 2.2T extra space > outside of them...), and try to run scrub there. If that succeeds and > the bug happens there too, debugging *should* be easier, and it > *should* be possible to run it under KMEMCHECK too. If the bug doesn''t > happen inside a virtual machine, that would be interesting information > too.I have now been able to reproduce the bug in KVM with the setup described above. I think it''s safe to say now that the bug depends on some kind of interaction between btrfs and dm-crypt. With the following setup, the bug does NOT happen: * kvm, single cpu * sees 3 disks, /dev/vda=root, /dev/vdb=btrfs-dev1, /dev/vdc=btrfs-dev2 * The btrfs devices are essentially snapshots of the real btrfs devices in raid-1 configuration (2*1.1T). As the real devices are encrypted, the decryption is done outside the KVM, i.e. the KVM snapshots are backed by the decrypted devices. With the following setup, the bug DOES happen: * kvm, single cpu * sees 3 disks, /dev/vda=root, /dev/vdb=part1, /dev/vdc=part2, where part[12] is are LUKS containers containing the individual btrfs devices * inside kvm, they are opened using cryptsetup luksOpen /dev/vdb root1 cryptsetup luksOpen /dev/vdc root2 * after this, the filesystem is mounted with mount /dev/mapper/root1 /media -o device=/dev/mapper/root1,device=/dev/mapper/root2 * The devices are snapshots of the actual physical encrypted partitions containing the btrfs devices. I have not yet figured out if this can be reproduced using a pristine, smaller btrfs filesystem in raid-1 configuration inside KVM or if there''s something about my specific filesystem that causes this. I can investigate that too; it''s easier to do for me than the above testing, as I don''t need to have continuous physical access to the computer to do that. Here''s the .config of the kernel I used inside KVM to reproduce this: http://www.niksula.hut.fi/~sliedes/btrfs/config.3.4.4 I also ran the same tests with KMEMCHECK. Both with and without crypto, there were quite a number of (of course possibly false) warnings from btrfs code. I doubt any of them are related to this bug - there were no KMEMCHECK warnings during the scrub operation. Here are the logs, anyway: http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.nocrypto.gz http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.crypto.gz Sami
Sami Liedes
2012-Jul-28 18:50 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Sat, Jul 28, 2012 at 03:08:47PM +0300, Sami Liedes wrote:> I have not yet figured out if this can be reproduced using a pristine, > smaller btrfs filesystem in raid-1 configuration inside KVM or if > there''s something about my specific filesystem that causes this. I can > investigate that too; it''s easier to do for me than the above testing, > as I don''t need to have continuous physical access to the computer to > do that.It seems the bug doesn''t happen with a new filesystem made with mkfs.btrfs -mraid1 -draid1 /dev/mapper/btrfs_crypt /dev/mapper/btrfs2_crypt and subsequently filled with data... Unfortunate. I started to wonder what''s so special about the second device of my filesystem. All the errors always seem to come from that device. The only thing that comes to mind is that that device was not originally part of the filesystem; it started as a single-device filesystem formatted with mkfs.btrfs with default options and was subsequently rebalanced under a 3.4.2 kernel. So I started to play with btrfs fi balance in my KVM instance with my two-device filesystem, and hit this oops, which may or may not be related to my previous bug and/or dm-crypt... ------------------------------------------------------------ [ 15.195342] device fsid a844eb60-eb9c-4e24-ae91-c1627bf2d439 devid 1 transid 176 /dev/mapper/btrfs_crypt [ 15.196606] device fsid a844eb60-eb9c-4e24-ae91-c1627bf2d439 devid 2 transid 176 /dev/mapper/btrfs2_crypt [ 15.197895] device fsid a844eb60-eb9c-4e24-ae91-c1627bf2d439 devid 1 transid 176 /dev/mapper/btrfs_crypt [ 15.200202] btrfs: disk space caching is enabled # btrfs device delete [something...] [ 1462.242456] btrfs: unable to go below two devices on raid1 # btrfs fi balance start -mconvert=dup -dconvert=raid0 /media [ 1895.048075] btrfs: unable to start balance with target metadata profile 32 # btrfs fi balance start -dconvert=raid0 /media [ 1917.106536] btrfs: relocating block group 10229907456 flags 17 [ 1929.188609] btrfs: relocating block group 8887730176 flags 17 [ 1944.690916] btrfs: found 2152 extents [ 1947.016210] btrfs: found 2152 extents [ 1947.421397] btrfs: relocating block group 7545552896 flags 17 [ 2024.225203] btrfs: found 36762 extents [ 2094.983055] btrfs: corrupt node block=8830455808,root=1: generation (197) too new in slot 0 (maximum expected 196) [ 2094.984858] ------------[ cut here ]------------ [ 2094.986076] WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xa5/0xc0() [ 2094.987912] Hardware name: Bochs [ 2094.988735] btrfs: Transaction abortedPid: 1623, comm: btrfs-transacti Not tainted 3.4.4 #9 [ 2094.988741] Call Trace: [ 2094.989361] [<ffffffff8103da65>] warn_slowpath_common+0x75/0xb0 [ 2094.990829] [<ffffffff8103db11>] warn_slowpath_fmt+0x41/0x50 [ 2094.992243] [<ffffffff8148a2b5>] __btrfs_abort_transaction+0xa5/0xc0 [ 2094.993521] [<ffffffff81498463>] __btrfs_free_extent+0x213/0x7b0 [ 2094.994176] [<ffffffff8149cac7>] ? run_clustered_refs+0xd7/0xa10 [ 2094.994835] [<ffffffff8158d9fd>] ? do_raw_spin_unlock+0x5d/0xb0 [ 2094.995481] [<ffffffff8149ce05>] run_clustered_refs+0x415/0xa10 [ 2094.996143] [<ffffffff814f13e8>] ? find_ref_head+0xb8/0xe0 [ 2094.996806] [<ffffffff8149d499>] ? btrfs_run_delayed_refs+0x99/0x430 [ 2094.997505] [<ffffffff8149d56d>] btrfs_run_delayed_refs+0x16d/0x430 [ 2094.998188] [<ffffffff81497c19>] ? next_block_group.isra.65+0x29/0x80 [ 2094.998890] [<ffffffff8176f326>] ? _raw_spin_unlock+0x26/0x30 [ 2094.999520] [<ffffffff8149d8e0>] btrfs_write_dirty_block_groups+0xb0/0x630 [ 2095.000281] [<ffffffff814d365a>] ? free_extent_buffer+0x1a/0x70 [ 2095.000931] [<ffffffff8176905d>] commit_cowonly_roots+0xe7/0x1c1 [ 2095.001610] [<ffffffff814aed29>] btrfs_commit_transaction+0x519/0xa40 [ 2095.002317] [<ffffffff8105f1d0>] ? abort_exclusive_wait+0xb0/0xb0 [ 2095.002997] [<ffffffff814a7a25>] transaction_kthread+0x245/0x2c0 [ 2095.003673] [<ffffffff814a77e0>] ? check_leaf.isra.105+0x300/0x300 [ 2095.004372] [<ffffffff8105e57e>] kthread+0x8e/0xa0 [ 2095.004902] [<ffffffff81771464>] kernel_thread_helper+0x4/0x10 [ 2095.005549] [<ffffffff8105e4f0>] ? kthread_flush_work_fn+0x10/0x10 [ 2095.006222] [<ffffffff81771460>] ? gs_change+0x13/0x13 [ 2095.006787] ---[ end trace 8341f112debcf176 ]--- [ 2095.007287] BTRFS error (device dm-1) in __btrfs_free_extent:5134: IO failure [ 2095.008059] btrfs is forced readonly [ 2095.008454] btrfs: run_one_delayed_ref returned -5 [ 2095.008455] BTRFS error (device dm-1) in btrfs_run_delayed_refs:2454: IO failure [ 2176.876382] ------------[ cut here ]------------ [ 2176.877217] kernel BUG at fs/btrfs/relocation.c:3733! [ 2176.878139] invalid opcode: 0000 [#1] SMP [ 2176.879049] CPU 5 [ 2176.879435] Pid: 1676, comm: btrfs Tainted: G W 3.4.4 #9 Bochs Bochs [ 2176.880383] RIP: 0010:[<ffffffff814f97d3>] [<ffffffff814f97d3>] relocate_block_group+0x643/0x690 [ 2176.880383] RSP: 0000:ffff880002bfdb08 EFLAGS: 00010206 [ 2176.880383] RAX: ffffffffffffffe2 RBX: ffffffffffffffe2 RCX: ffff880007aade10 [ 2176.880383] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880006fbe800 [ 2176.880383] RBP: ffff880002bfdb88 R08: 0000000000000003 R09: 0000000000000000 [ 2176.880383] R10: 0000000100072905 R11: 0000000000000001 R12: ffff8800037a8020 [ 2176.880383] R13: 0000000000005ee7 R14: ffff880006bbae10 R15: ffff8800037a8000 [ 2176.880383] FS: 0000000000000000(0000) GS:ffff880007d40000(0063) knlGS:00000000f75be720 [ 2176.880383] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 2176.880383] CR2: 00000000f75c2c80 CR3: 0000000006664000 CR4: 00000000000006a0 [ 2176.880383] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2176.880383] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2176.880383] Process btrfs (pid: 1676, threadinfo ffff880002bfc000, task ffff880007aad880) [ 2176.880383] Stack: [ 2176.880383] ffff880005c94fb0 ffff880002bfdb50 0000000000000001 0000000000000001 [ 2176.880383] ffff880005c94bc8 0000000000000000 ffff880002bfdb88 00ffffff814afd9a [ 2176.880383] a800000001ebfdf0 000000000000c000 ffff8800037a8000 ffff8800037a8000 [ 2176.880383] Call Trace: [ 2176.880383] [<ffffffff814f99c4>] btrfs_relocate_block_group+0x1a4/0x2d0 [ 2176.880383] [<ffffffff8106cbad>] ? __wake_up+0x2d/0x70 [ 2176.880383] [<ffffffff814d67e5>] btrfs_relocate_chunk.isra.52+0x65/0x700 [ 2176.880383] [<ffffffff814d365a>] ? free_extent_buffer+0x1a/0x70 [ 2176.880383] [<ffffffff8158d9fd>] ? do_raw_spin_unlock+0x5d/0xb0 [ 2176.880383] [<ffffffff8176f326>] ? _raw_spin_unlock+0x26/0x30 [ 2176.880383] [<ffffffff814ce1e2>] ? release_extent_buffer.isra.41+0x32/0xc0 [ 2176.880383] [<ffffffff814d365a>] ? free_extent_buffer+0x1a/0x70 [ 2176.880383] [<ffffffff814d366a>] ? free_extent_buffer+0x2a/0x70 [ 2176.880383] [<ffffffff814dacbf>] btrfs_balance+0x7ff/0xce0 [ 2176.880383] [<ffffffff814df9d9>] btrfs_ioctl_balance.isra.51+0x139/0x430 [ 2176.880383] [<ffffffff814e3245>] btrfs_ioctl+0x95/0x1260 [ 2176.880383] [<ffffffff81063ace>] ? up_read+0x1e/0x40 [ 2176.880383] [<ffffffff81024abc>] ? do_page_fault+0x1ac/0x490 [ 2176.880383] [<ffffffff810ebf9b>] ? __vma_link_rb+0x2b/0x30 [ 2176.880383] [<ffffffff81152796>] compat_sys_ioctl+0x96/0x1310 [ 2176.880383] [<ffffffff81587a49>] ? lockdep_sys_exit_thunk+0x35/0x67 [ 2176.880383] [<ffffffff81771612>] sysenter_dispatch+0x7/0x25 [ 2176.880383] Code: ff ff 66 0f 1f 44 00 00 41 0f b6 87 59 06 00 00 83 c8 08 41 88 87 59 06 00 00 e9 10 fe ff ff bb f4 ff ff ff e9 88 fc ff ff 0f 0b <0f> 0b c7 45 98 00 00 00 00 4c 89 f7 e8 cc 27 f9 ff 48 83 ca ff [ 2176.880383] RIP [<ffffffff814f97d3>] relocate_block_group+0x643/0x690 [ 2176.880383] RSP <ffff880002bfdb08> [ 2176.922828] ---[ end trace 8341f112debcf177 ]--- ------------------------------------------------------------ Sami
Stefan Behrens
2013-Mar-27 11:54 UTC
Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2
On Tue, 3 Jul 2012 02:01:21 +0300, Sami Liedes wrote:> Hi, > > I just got this oops on a computer running 3.4.2. > > A few minutes before I had started "btrfs device scrub /" and had a > watcher process running "btrfs scrub status /" every 5 seconds. After > a few gigabytes of scrubbing, I got this crash. > > The oops is transcribed from photos, so it may contain some errors. I > tried to be careful, and double checked the backtrace. > > Sami > > ------------------------------------------------------------ > general protection fault: 0000 [#1] SMP > CPU 4 > Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables ebtable_nat ebtables cn rfcomm bnep > parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE > ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm acpi_cpufreq snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmi > di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev mperf pci_hotplug i2c_i801 i2c_core processor button > intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod nbd btrfs libcrc32c > zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata aes_generic cryptd scsi_mod e1000e thermal fa > n thermal_sys [last unloaded: scsi_wait_scan] > > Pid: 30863, comm: btrfs-endio-met Tainted: G W 3.4.2 #1 System manufacturer System Product Name/P8P67 EVO > RIP: 0010:[<ffffffff811e83bd>] [<ffffffff811e83bd>] memcpy+0xd/0x110 > RSP: 0000:ffff88003174dba8 EFLAGS: 00010202 > RAX: ffff88003174dc8f RBX: 0000000000000011 RCX: 0000000000000002 > RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff88003174dc8f > RBP: ffff88003174dbf0 R08: 000000000000000a R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003174dca0 > R13: ffff8800659f42b0 R14: 0000000000000048 R15: 0000000000000011 > FS: 0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000973c000 CR3: 0000000167ef3000 CR4: 00000000000407e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btrfs-endio-met (pid: 30863, threadinfo ffff88003174c000, task ffff88006f818000) > Stack: > ffffffffa026bd6b ffff8801960f5000 0000000000008003 0000000000001000 > ffff88003174dc58 00000000000003dd ffff88000ac13c60 ffff88003174dc58 > 696f70203a61685f ffff88003174dc00 ffffffffa026904d ffff88003174dcd0 > Call Trace: > [<ffffffffa026bd6b>] ? read_extent_buffer+0xbb/0x110 [btrfs] > [<ffffffffa026304d>] btrfs_node_key+0x1d/0x20 [btrfs] > [<ffffffffa02994e0>] __readahead_hook.isra.5+0x3c0/0x420 [btrfs] > [<ffffffffa029986f>] btree_readahead_hook+0x1f/0x40 [btrfs] > [<ffffffffa023f841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs] > [<ffffffffa0267452>] ? find_first_extent_bit_state+0x22/0x80 [btrfs] > [<ffffffffa026809b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs] > [<ffffffffa023ee61>] ? end_workqueue_fn+0x31/0x50 [btrfs] > [<ffffffff81158958>] bio_endio+0x18/0x30 > [<ffffffffa023ee6c>] end_workqueue_fn+0x3c/0x50 [btrfs] > [<ffffffffa0275857>] worker_loop+0x157/0x560 [btrfs] > [<ffffffffa0275700>] ? btrfs_queue_worker+0x310/0x310 [btrfs] > [<ffffffff81058e5e>] kthread+0x8e/0xa0 > [<ffffffff81418fe4>] kernel_thread_helper+0x4/0x10 > [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70 > [<ffffffff81418fe0>] ? gs_change+0x13/0x13 > Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90... > 8 4c 8b 56 10 4c > RIP [<ffffffff811e83bd>] memcpy+0xd/0x110 > RSP <ffff88003174dba8> >Tonight my box had the same on a system running 3.7.10. Also with a Btrfs scrub running. general protection fault: 0000 [#1] SMP Modules linked in: xt_LOG xt_limit xt_multiport iptable_nat nf_nat_ipv4 nf_nat ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_mangle ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth sp5100_tco edac_core edac_mce_amd k8temp i2c_piix4 kvm_amd kvm e1000e serio_raw mac_hid shpchp lp parport btrfs zlib_deflate libcrc32c raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear ahci libahci [last unloaded: ipmi_msghandler] CPU 1 Pid: 10376, comm: btrfs-endio-met Not tainted 3.7.10-030710-generic #201302271235 MICRO-STAR INTERNATIONAL CO., LTD MS-96B3/MS-96B3 RIP: 0010:[<ffffffff8134dcc2>] [<ffffffff8134dcc2>] memcpy+0x12/0x110 RSP: 0018:ffff88011791fbc0 EFLAGS: 00010206 RAX: ffff88011791fca5 RBX: ffff88005d6836d8 RCX: 0000000000000003 RDX: 0000000000000003 RSI: 0005080000000000 RDI: ffff88011791fca5 RBP: ffff88011791fc08 R08: 0000000000003fd1 R09: ffff88011791fbd8 R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000003 R13: 0000000000000003 R14: ffff88011791fca8 R15: 0000000000000028 FS: 00007f78717fa700(0000) GS:ffff88011fc80000(0000) knlGS:00000000f743f700 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffff600000 CR3: 0000000092a4e000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-endio-met (pid: 10376, threadinfo ffff88011791e000, task ffff8801128a1700) Stack: ffffffffa00f1e5c ffff88010e407000 0000000000003000 0000000000001000 ffff88011791fca8 ffff880073b5cea0 9b9628ac89c7f24a ffff88011791fca8 00000000000001ed ffff88011791fc18 ffffffffa00e87f2 ffff88011791fce8 Call Trace: [<ffffffffa00f1e5c>] ? read_extent_buffer+0xbc/0x120 [btrfs] [<ffffffffa00e87f2>] btrfs_node_key+0x22/0x30 [btrfs] [<ffffffffa01255d3>] __readahead_hook.isra.4+0x3d3/0x410 [btrfs] [<ffffffffa01259d4>] btree_readahead_hook+0x24/0x40 [btrfs] [<ffffffffa00cd7b9>] btree_readpage_end_io_hook+0x149/0x290 [btrfs] [<ffffffffa00ee912>] end_bio_extent_readpage+0x142/0x330 [btrfs] [<ffffffffa00cbe66>] ? end_workqueue_fn+0x36/0x50 [btrfs] [<ffffffff811c8a5d>] bio_endio+0x1d/0x40 [<ffffffffa00cbe71>] end_workqueue_fn+0x41/0x50 [btrfs] [<ffffffffa00fcd80>] worker_loop+0xa0/0x320 [btrfs] [<ffffffffa00fcce0>] ? check_pending_worker_creates.isra.1+0xf0/0xf0 [btrfs] [<ffffffff8107dca0>] kthread+0xc0/0xd0 [<ffffffff8107dbe0>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff816d19ec>] ret_from_fork+0x7c/0xb0 [<ffffffff8107dbe0>] ? flush_kthread_worker+0xb0/0xb0 Code: 4e 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d RIP [<ffffffff8134dcc2>] memcpy+0x12/0x110 RSP <ffff88011791fbc0> ---[ end trace 76ff4bab3b10384b ]--- Looking at memcpy() in arch/x86/lib/memcpy_64.S, RDX is the length, RSI the source and RDI the destination. And looking at the registers when the issue happens, there''s a pattern. RDX is not as expected, RSI is 000508000000.... and RDI looks good. The one from tonight: RDX is 0x3 (0x11 would be expected), RSI is 0005080000000000 (unexpected), RDI is ffff88011791fca5 (looks good). The one from Sami in the quoted mail is excatly the same issue that my box had tonight. RDX is 0x1 (0x11 would be expected), RSI is 0005080000000003, RDI looks good. "GPF in read_extent_buffer while scrubbing on 3.7.0-rc8-00014-g27d7c2a" http://comments.gmane.org/gmane.comp.file-systems.btrfs/21559 is the same, but this time RDX is 0x11 as expected. RSI is 0005080000000003 (unexpected), RDI looks good. http://abrt.fedoraproject.org/faf/problems/326667/ is something that shows a similar pattern (without Btrfs being involved), here the "report backtrace 53429" looks somehow similar (memcpy is the crashing function, RSI: 00050800000007e1, RDI would make sense). What''s causing this issue? The RSI register content, does it look like CPU flags or anything else known?