thr3ads.net - Btrfs devel - 3.5.3: kernel BUG at fs/btrfs/ctree.c:3451! [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Marc MERLIN

2012-Sep-20 17:17 UTC

3.5.3: kernel BUG at fs/btrfs/ctree.c:3451!

I had a btrfs built on top of 5 drives (dmcrypt devices).

The drive then died while I was writing to the filesystem and my system
crashed and rebooted:

[384555.534020] sd 10:0:0:0: rejecting I/O to offline device                    
[384555.535057] sd 10:0:0:0: rejecting I/O to offline device                    
[384556.666885] ------------[ cut here ]------------                            
[384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache                     
[384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!                            
[384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP                           
[384556.687878] CPU 2                                                           

	/* push data from right to left */
	copy_extent_buffer(left, right,
			   btrfs_item_nr_offset(btrfs_header_nritems(left)),
			   btrfs_item_nr_offset(0),
			   push_items * sizeof(struct btrfs_item));

	push_space = BTRFS_LEAF_DATA_SIZE(root) -
		     btrfs_item_offset_nr(right, push_items - 1);

	copy_extent_buffer(left, right, btrfs_leaf_data(left) +
		     leaf_data_end(root, left) - push_space,
		     btrfs_leaf_data(right) +
		     btrfs_item_offset_nr(right, push_items - 1),
		     push_space);
	old_left_nritems = btrfs_header_nritems(left);
	BUG_ON(old_left_nritems <= 0);  <<<<<<< 3451

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc MERLIN

2012-Sep-21 03:46 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN
wrote:> I had a btrfs built on top of 5 drives (dmcrypt devices).
> 
> The drive then died while I was writing to the filesystem and my system
> crashed and rebooted:
> 
> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device
> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device
> [384556.666885] ------------[ cut here ]------------
> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache
> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!
> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP
> [384556.687878] CPU 2
>  
Oh my, now I''m trying again with a new drive, and a big cp from an
existing array to a new one dies with:
[32042.079411] ------------[ cut here ]------------                             
[32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
[32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
[32042.099227] CPU 1                                                            
[32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
core sparse_keymap    

    int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
			    u64 length, u64 logical, struct page *page,
			    int mirror_num)
    {
	    struct bio *bio;
	    struct btrfs_device *dev;
	    DECLARE_COMPLETION_ONSTACK(compl);
	    u64 map_length = 0;
	    u64 sector;
	    struct btrfs_bio *bbio = NULL;
	    int ret;

	    BUG_ON(!mirror_num); <<<<<

This is more of a problem since I can''t backup my filesystem (source is
ext4 and destination is btrfs).

Any suggestion on what went wrong here?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2012-Sep-21 03:51 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On Thu, Sep 20, 2012 at 9:46 PM, Marc MERLIN <marc@merlins.org>
wrote:> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
>> I had a btrfs built on top of 5 drives (dmcrypt devices).
>>
>> The drive then died while I was writing to the filesystem and my system
>> crashed and rebooted:
>>
>> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device
>> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device
>> [384556.666885] ------------[ cut here ]------------
>> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache
>> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!
>> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP
>> [384556.687878] CPU 2
>>
>
> Oh my, now I''m trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov
async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> core sparse_keymap
>
>     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
>                             u64 length, u64 logical, struct page *page,
>                             int mirror_num)
>     {
>             struct bio *bio;
>             struct btrfs_device *dev;
>             DECLARE_COMPLETION_ONSTACK(compl);
>             u64 map_length = 0;
>             u64 sector;
>             struct btrfs_bio *bbio = NULL;
>             int ret;
>
>             BUG_ON(!mirror_num); <<<<<
>
> This is more of a problem since I can''t backup my filesystem
(source is
> ext4 and destination is btrfs).
>
> Any suggestion on what went wrong here?
There should have been a stack trace as well as a couple other things,
can you post those as well please?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Sep-21 03:53 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On 09/21/2012 11:46 AM, Marc MERLIN wrote:> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
>> I had a btrfs built on top of 5 drives (dmcrypt devices).
>>
>> The drive then died while I was writing to the filesystem and my system
>> crashed and rebooted:
>>
>> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device
>> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device
>> [384556.666885] ------------[ cut here ]------------
>> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache
>> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!
>> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP
>> [384556.687878] CPU 2
>>
>  
> Oh my, now I''m trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov
async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> core sparse_keymap    
> 
>     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
> 			    u64 length, u64 logical, struct page *page,
> 			    int mirror_num)
>     {
> 	    struct bio *bio;
> 	    struct btrfs_device *dev;
> 	    DECLARE_COMPLETION_ONSTACK(compl);
> 	    u64 map_length = 0;
> 	    u64 sector;
> 	    struct btrfs_bio *bbio = NULL;
> 	    int ret;
> 
> 	    BUG_ON(!mirror_num); <<<<<
> 
> This is more of a problem since I can''t backup my filesystem
(source is
> ext4 and destination is btrfs).
> 
> Any suggestion on what went wrong here?
> 
Could you please show us the complete stack info?

thanks,
liubo
> Thanks,
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc MERLIN

2012-Sep-21 04:11 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On Thu, Sep 20, 2012 at 09:51:59PM -0600, cwillu wrote:> > Oh my, now I''m trying again with a new drive, and a big cp
from an
> > existing array to a new one dies with:
> > [32042.079411] ------------[ cut here ]------------
> > [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> > [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> > [32042.099227] CPU 1
> > [32042.101095] Modules linked in:[32042.105950]  raid456
async_raid6_recov async
> > _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
> >  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> > nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> > core sparse_keymap
> >
> >     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64
start,
> >                             u64 length, u64 logical, struct page
*page,
> >                             int mirror_num)
> >     {
> >             struct bio *bio;
> >             struct btrfs_device *dev;
> >             DECLARE_COMPLETION_ONSTACK(compl);
> >             u64 map_length = 0;
> >             u64 sector;
> >             struct btrfs_bio *bbio = NULL;
> >             int ret;
> >
> >             BUG_ON(!mirror_num); <<<<<
> >
> > This is more of a problem since I can''t backup my filesystem
(source is
> > ext4 and destination is btrfs).
> >
> > Any suggestion on what went wrong here?
> 
> There should have been a stack trace as well as a couple other things,
> can you post those as well please?
Actually, I found a few more lines in syslog just before the crash:
 kernel: [32008.938796] lost page write due to I/O error on
/dev/mapper/crypt_e0e810c2-0d8f-409f-9674-e05763083a45
 kernel: [32008.938800] btrfs: bdev
/dev/mapper/crypt_e0e810c2-0d8f-409f-9674-e05763083a45 errs: wr 1933, rd 0,
flush 32, corrupt 0, gen 0
 kernel: [32008.954383] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.954386] btrfs: bdev /dev/dm-6 errs: wr 1490, rd 0, flush 18,
corrupt 0, gen 0
 kernel: [32008.969038] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.969043] btrfs: bdev /dev/dm-6 errs: wr 1491, rd 0, flush 18,
corrupt 0, gen 0
 kernel: [32008.979997] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.980002] btrfs: bdev /dev/dm-6 errs: wr 1492, rd 0, flush 18,
corrupt 0, gen 0

That helps answer my question: disk error caused the crash.

As for a stack trace, I was suprised that I didn''t get one, but the
lines I posted
are the last ones I got on my serial console (they didn''t even make it
to syslog).

to be more clear, all I got is:
[32042.079411] ------------[ cut here ]------------                             
[32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
[32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
[32042.099227] CPU 1                                                            
[32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
core sparse_keymap                                                              
LILO 23.2 boot:                                                                 
Loading linux...........................................................        
BIOS data check successful   

I''m booting with:
auto BOOT_IMAGE=linux ro root=900 panic=20 console=tty0 console=ttyS0,115200n8
elevator=cfq pcie_aspm=force edd=off irqpoll

Is panic=20 causing the stack trace not to be printed somehow?

If not, is one of my config options set wrong?
http://marc.merlins.org/tmp/config-3.5.3-amd64-preempt-noide-20120903

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2012-Sep-21 04:57 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On 09/21/2012 05:46, Marc MERLIN wrote:> Oh my, now I''m trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov
async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
>   ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> core sparse_keymap
>
>      int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
> 			    u64 length, u64 logical, struct page *page,
> 			    int mirror_num)
>      {
> 	    struct bio *bio;
> 	    struct btrfs_device *dev;
> 	    DECLARE_COMPLETION_ONSTACK(compl);
> 	    u64 map_length = 0;
> 	    u64 sector;
> 	    struct btrfs_bio *bbio = NULL;
> 	    int ret;
>
> 	    BUG_ON(!mirror_num); <<<<<
>
This was fixed with commit c0901581ad077004145c9ee80e843fba71c100b8 and 
is included in Linux 3.6 RC1.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc MERLIN

2012-Sep-21 05:43 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

On Fri, Sep 21, 2012 at 06:57:32AM +0200, Stefan Behrens
wrote:> >	    BUG_ON(!mirror_num); <<<<<
> >
> 
> This was fixed with commit c0901581ad077004145c9ee80e843fba71c100b8 and 
> is included in Linux 3.6 RC1.
Congrats for all having a time machine and fixing my reported bugs in the
past :)

Thanks for the fix and the link,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc MERLIN

2012-Sep-23 16:16 UTC

head link

Re: crash in read_extent_buffer+0xb7/0xfb

On Thu, Sep 20, 2012 at 08:46:52PM -0700, Marc MERLIN
wrote:> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
> > I had a btrfs built on top of 5 drives (dmcrypt devices).
> > 
> > The drive then died while I was writing to the filesystem and my
system
> > crashed and rebooted:
> > 
> > [384555.534020] sd 10:0:0:0: rejecting I/O to offline device
> > [384555.535057] sd 10:0:0:0: rejecting I/O to offline device
> > [384556.666885] ------------[ cut here ]------------
> > [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache
> > [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!
> > [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP
> > [384556.687878] CPU 2
> > 
>  
> Oh my, now I''m trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov
async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> core sparse_keymap    
I had a different crash while copying to a btrfs 5 disk array. Not sure if this
is
also fixed too, but pasting just in case.
 
[207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7
6, corrupt 0, gen 0
[207055.067267] btrfs bad mapping eb start 8653217792 len 4096, wanted 184467440
50581869634 4
[207055.078099] general protection fault: 0000 [#1] PREEMPT SMP
[207055.085213] CPU 3
[207055.087173] Modules linked in:[207055.091512]  raid456 async_raid6_recov asy
nc_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb1
05 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_REJECT
 xt_state xt_tcpudp xt_LOG iptable_mangle iptable_filter deflate ctr twofish_gen
eric twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia
_x86_64 serpent_sse2_x86_64 lrw serpent_generic xts gf128mul blowfish_generic bl
owfish_x86_64 blowfish_common cast5 des_generic xcbc rmd160 sha512_generic crypt
o_null af_key xfrm_algo dm_crypt dm_mirror dm_region_hash dm_log aes_x86_64 fuse
 lm85 hwmon_vid dm_snapshot dm_mod iptable_nat ip_tables nf_conntrack_ftp ipt_MA
SQUERADE nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 x_tables nf_conntrack sg st snd
_pcm_oss snd_mixer_oss snd_hda_codec_hdmi snd_hda_codec_realtek snd_cmipci gamep
ort rc_ati_x10 snd_opl3_lib snd_mpu401_uart pl2303 ati_remote rc_core snd_seq_mi
di snd_seq_midi_event snd_seq usbserial snd_rawmidi kvm_intel kvm snd_seq_device
 snd_hda_intel[207055.193933]  i915 snd_hda_codec drm_kms_helper snd_hwdep snd_p
cm drm snd_timer eeepc_wmi asus_wmi sparse_keymap rfkill snd i2c_i801 parport_pc
 acpi_cpufreq i2c_algo_bit microcode crc32c_intel ehci_hcd xhci_hcd ghash_clmuln
i_intel pci_hotplug wmi cryptd r8169 snd_page_alloc soundcore pcspkr tpm_tis mpe
rf tpm evdev tpm_bios usbcore i2c_core parport mii lpc_ich mei sata_sil24 corete
mp sata_mv fan thermal processor button video thermal_sys usb_common [last unloa
ded: kl5kusb105]

[207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W   
3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product
Name/P8H67-M PRO
[207055.261478] RIP: 0010:[<ffffffff811fc9ae>]  [<ffffffff811fc9ae>]
read_extent_buffer+0xb7/0xfb
[207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
[207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX:
ffff8800405ba2c8
[207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI:
ffff880105ff38ec
[207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09:
0000160000000000
[207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12:
0000000000000004
[207055.313356] R13: ffff880000000000 R14: fffffffa9d7b9446 R15: 000000000000044
2
[207055.322032] FS:  0000000000000000(0000) GS:ffff88011f380000(0000)
knlGS:0000000000000000
[207055.331692] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[207055.339014] CR2: 00000000f7021000 CR3: 0000000001a0c000 CR4:
00000000000407e0
[207055.347715] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[207055.356403] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[207055.365092] Process btrfs-transacti (pid: 6456, threadinfo
ffff880105ff2000,task ffff880105e7e600)
[207055.376219] Stack:
[207055.380369]  fffffffa9d7b9442 000fffffffa9d7b9 ffff880105ff38a0
0000000000000000
[207055.389447]  ffff8800405ba1f8 fffffffa9d7b9431 fffffffa9d7b9442
00000000798be017
[207055.398481]  ffff880105ff3910 ffffffff811f2855 ffff8800405ba1f8
fffffffa9d7b9000
[207055.407543] Call Trace:
[207055.411582]  [<ffffffff811f2855>] btrfs_token_item_offset+0x86/0xb8
[207055.419436]  [<ffffffff811f295f>] btrfs_item_offset+0xb/0xd
[207055.426585]  [<ffffffff811c04bf>] btrfs_item_offset_nr+0x14/0x16
[207055.434143]  [<ffffffff811c08f9>] leaf_space_used+0x58/0x81
[207055.441269]  [<ffffffff811c42ea>] btrfs_leaf_free_space+0x33/0x72
[207055.448924]  [<ffffffff811c4d45>] push_leaf_right+0xa1/0x142
[207055.456092]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.463329]  [<ffffffff811c4f13>] split_leaf+0x79/0x52f
[207055.470222]  [<ffffffff811f295f>] ? btrfs_item_offset+0xb/0xd
[207055.477483]  [<ffffffff811c08f9>] ? leaf_space_used+0x58/0x81
[207055.484744]  [<ffffffff814aac0e>] ? _raw_write_unlock+0x28/0x33
[207055.492203]  [<ffffffff8120a523>] ?
btrfs_set_lock_blocking_rw+0x9b/0xec
[207055.500770]  [<ffffffff811c5b5c>] btrfs_search_slot+0x583/0x62e
[207055.508199]  [<ffffffff811c6e32>] btrfs_insert_empty_items+0x62/0xb4
[207055.516029]  [<ffffffff811cef40>] run_clustered_refs+0x3e2/0x741
[207055.523655]  [<ffffffff811cf503>] btrfs_run_delayed_refs+0x264/0x373
[207055.531450]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
[207055.538950]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.545965]  [<ffffffff814aaab9>] ? _raw_spin_unlock+0x27/0x32
[207055.553168]  [<ffffffff811f6c51>] ?
btrfs_run_ordered_operations+0x19f/0x1ae
[207055.561517]  [<ffffffff811dd30f>] btrfs_commit_transaction+0xa9/0x8dc
[207055.569231]  [<ffffffff8105957a>] ? add_wait_queue+0x44/0x44
[207055.576235]  [<ffffffff81049f32>] ?
init_timer_deferrable_key+0x17/0x17
[207055.584056]  [<ffffffff811d7e58>] transaction_kthread+0x174/0x230
[207055.591332]  [<ffffffff811d7ce4>] ? try_to_freeze+0x33/0x33
[207055.598153]  [<ffffffff81058e3c>] kthread+0x86/0x8e
[207055.604162]  [<ffffffff814b08a4>] kernel_thread_helper+0x4/0x10
[207055.611168]  [<ffffffff81058db6>] ?
kthread_freezable_should_stop+0x3e/0x3e
[207055.619358]  [<ffffffff814b08a0>] ? gs_change+0x13/0x13
[207055.625624] Code: b7 6d db b6 6d db b6 6d 49 bd 00 00 00 00 00 88 ff ff 49
c1 e0 03 eb 43 48 8b 8b 50 01 00 00 4c 89 d0 48 89 d7 4c 29 f8 4c 39 e0
<4a> 8b 0c 01 49 0f 47 c4 49 83 c0 08 49 29 c4 4c 01 c9 48 c1 f9
[207055.647970] RIP  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.655271]  RSP <ffff880105ff3880>
[207055.665029] ---[ end trace 06a6f0aa8102336a ]---
[207055.671223] Kernel panic - not syncing: Fatal exception



-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Sep-24 13:08 UTC

head link

Re: crash in read_extent_buffer+0xb7/0xfb

On Sun, Sep 23, 2012 at 09:16:34AM -0700, Marc MERLIN
wrote:> > Oh my, now I''m trying again with a new drive, and a big cp
from an
> > existing array to a new one dies with:
> > [32042.079411] ------------[ cut here ]------------
> > [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> > [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> > [32042.099227] CPU 1
> > [32042.101095] Modules linked in:[32042.105950]  raid456
async_raid6_recov async
> > _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4
kl5kusb105
> >  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc
rc_ati_x10 s
> > nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote
asus_wmi rc_
> > core sparse_keymap    
> 
> I had a different crash while copying to a btrfs 5 disk array. Not sure if
this is
> also fixed too, but pasting just in case.
>  
> [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0,
flush 7 6, corrupt 0, gen 0
So many write and flush errors?
> [207055.067267] btrfs bad mapping eb start 8653217792 len 4096, wanted
184467440 50581869634 4
4680         if (start + min_len > eb->len) {
4681                 printk(KERN_ERR "btrfs bad mapping eb start %llu len
%lu, "
4682                        "wanted %lu %lu\n", (unsigned long
long)eb->start,
4683                        eb->len, start, min_len);
4684                 WARN_ON(1);
4685                 return -EINVAL;
4686         }

8653217792  = 0x203c5a000	eb->start
4096       			eb->len

184467440   = 0x00afebff0	start
50581869634 = 0xbc6ea1442	min_len

bogus numbers, no pattern, not visible in the stacktrace.

> [207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W   
3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product
Name/P8H67-M PRO
> [207055.261478] RIP: 0010:[<ffffffff811fc9ae>] 
[<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
> [207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
> [207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX:
ffff8800405ba2c8
> [207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI:
ffff880105ff38ec
> [207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09:
0000160000000000
> [207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12:
0000000000000004
R11 contains the POISON_FREE pattern, though it''s not clear who and
where
used it. It may come from some unhandled case in the write error
recovery paths.

The crash site is not any of the BUG_ON but some place that actually
tries to access an unmapped memory, so from that point it slipped
through sanity checks.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc MERLIN

2012-Sep-24 14:41 UTC

head link

Re: crash in read_extent_buffer+0xb7/0xfb

On Mon, Sep 24, 2012 at 03:08:47PM +0200, David Sterba
wrote:> > I had a different crash while copying to a btrfs 5 disk array. Not
sure if this is
> > also fixed too, but pasting just in case.
> >  
> > [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd
0, flush 7 6, corrupt 0, gen 0
> 
> So many write and flush errors? 
It''s possible, I have crappy drives that were cheap that I''m
using for tests
and copies.
> R11 contains the POISON_FREE pattern, though it''s not clear who
and where
> used it. It may come from some unhandled case in the write error
> recovery paths. 
Considering that I was doing a huge copy to a brtfs filesystem (source was
ext4) and that I was using crappy drives in a 5 drives configuration
with no redundancy since there is no raid5 yet, it''s very possible.
> The crash site is not any of the BUG_ON but some place that actually
> tries to access an unmapped memory, so from that point it slipped
> through sanity checks.
If that helps, I forgot to decode the ASM:

=======   0:   b7 6d                   mov    $0x6d,%bh
   2:   db b6 6d db b6 6d       (bad)  0x6db6db6d(%rsi)
   8:   49 bd 00 00 00 00 00    movabs $0xffff880000000000,%r13
   f:   88 ff ff 
  12:   49 c1 e0 03             shl    $0x3,%r8
  16:   eb 43                   jmp    0x5b
  18:   48 8b 8b 50 01 00 00    mov    0x150(%rbx),%rcx
  1f:   4c 89 d0                mov    %r10,%rax
  22:   48 89 d7                mov    %rdx,%rdi
  25:   4c 29 f8                sub    %r15,%rax
  28:   4c 39 e0                cmp    %r12,%rax
  2b:*  4a 8b 0c 01             mov    (%rcx,%r8,1),%rcx     <-- trapping
instruction
  2f:   49 0f 47 c4             cmova  %r12,%rax
  33:   49 83 c0 08             add    $0x8,%r8
  37:   49 29 c4                sub    %rax,%r12
  3a:   4c 01 c9                add    %r9,%rcx
  3d:   48                      rex.W
  3e:   c1                      .byte 0xc1
  3f:   f9                      stc    

Code starting with the faulting instruction
==========================================   0:   4a 8b 0c 01             mov   
(%rcx,%r8,1),%rcx
   4:   49 0f 47 c4             cmova  %r12,%rax
   8:   49 83 c0 08             add    $0x8,%r8
   c:   49 29 c4                sub    %rax,%r12
   f:   4c 01 c9                add    %r9,%rcx
  12:   48                      rex.W
  13:   c1                      .byte 0xc1
  14:   f9                      stc   

For 

[207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W   
3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product
Name/P8H67-M PRO
[207055.261478] RIP: 0010:[<ffffffff811fc9ae>]  [<ffffffff811fc9ae>]
read_extent_buffer+0xb7/0xfb
[207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
[207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX:
ffff8800405ba2c8
[207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI:
ffff880105ff38ec
[207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09:
0000160000000000
[207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12:
0000000000000004
[207055.313356] R13: ffff880000000000 R14: fffffffa9d7b9446 R15: 000000000000044
2
[207055.322032] FS:  0000000000000000(0000) GS:ffff88011f380000(0000)
knlGS:0000000000000000
[207055.331692] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[207055.339014] CR2: 00000000f7021000 CR3: 0000000001a0c000 CR4:
00000000000407e0
[207055.347715] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[207055.356403] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[207055.365092] Process btrfs-transacti (pid: 6456, threadinfo
ffff880105ff2000,task ffff880105e7e600)
[207055.376219] Stack:
[207055.380369]  fffffffa9d7b9442 000fffffffa9d7b9 ffff880105ff38a0
0000000000000000
[207055.389447]  ffff8800405ba1f8 fffffffa9d7b9431 fffffffa9d7b9442
00000000798be017
[207055.398481]  ffff880105ff3910 ffffffff811f2855 ffff8800405ba1f8
fffffffa9d7b9000
[207055.407543] Call Trace:
[207055.411582]  [<ffffffff811f2855>] btrfs_token_item_offset+0x86/0xb8
[207055.419436]  [<ffffffff811f295f>] btrfs_item_offset+0xb/0xd
[207055.426585]  [<ffffffff811c04bf>] btrfs_item_offset_nr+0x14/0x16
[207055.434143]  [<ffffffff811c08f9>] leaf_space_used+0x58/0x81
[207055.441269]  [<ffffffff811c42ea>] btrfs_leaf_free_space+0x33/0x72
[207055.448924]  [<ffffffff811c4d45>] push_leaf_right+0xa1/0x142
[207055.456092]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.463329]  [<ffffffff811c4f13>] split_leaf+0x79/0x52f
[207055.470222]  [<ffffffff811f295f>] ? btrfs_item_offset+0xb/0xd
[207055.477483]  [<ffffffff811c08f9>] ? leaf_space_used+0x58/0x81
[207055.484744]  [<ffffffff814aac0e>] ? _raw_write_unlock+0x28/0x33
[207055.492203]  [<ffffffff8120a523>] ?
btrfs_set_lock_blocking_rw+0x9b/0xec
[207055.500770]  [<ffffffff811c5b5c>] btrfs_search_slot+0x583/0x62e
[207055.508199]  [<ffffffff811c6e32>] btrfs_insert_empty_items+0x62/0xb4
[207055.516029]  [<ffffffff811cef40>] run_clustered_refs+0x3e2/0x741
[207055.523655]  [<ffffffff811cf503>] btrfs_run_delayed_refs+0x264/0x373
[207055.531450]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
[207055.538950]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.545965]  [<ffffffff814aaab9>] ? _raw_spin_unlock+0x27/0x32
[207055.553168]  [<ffffffff811f6c51>] ?
btrfs_run_ordered_operations+0x19f/0x1ae
[207055.561517]  [<ffffffff811dd30f>] btrfs_commit_transaction+0xa9/0x8dc
[207055.569231]  [<ffffffff8105957a>] ? add_wait_queue+0x44/0x44
[207055.576235]  [<ffffffff81049f32>] ?
init_timer_deferrable_key+0x17/0x17
[207055.584056]  [<ffffffff811d7e58>] transaction_kthread+0x174/0x230
[207055.591332]  [<ffffffff811d7ce4>] ? try_to_freeze+0x33/0x33
[207055.598153]  [<ffffffff81058e3c>] kthread+0x86/0x8e
[207055.604162]  [<ffffffff814b08a4>] kernel_thread_helper+0x4/0x10
[207055.611168]  [<ffffffff81058db6>] ?
kthread_freezable_should_stop+0x3e/0x3e
[207055.619358]  [<ffffffff814b08a0>] ? gs_change+0x13/0x13
[207055.625624] Code: b7 6d db b6 6d db b6 6d 49 bd 00 00 00 00 00 88 ff ff 49
c1 e0 03 eb 43 48 8b 8b 50 01 00 00 4c 89 d0 48 89 d7 4c 29 f8 4c 39 e0
<4a> 8b 0c 01 49 0f 47 c4 49 83 c0 08 49 29 c4 4c 01 c9 48 c1 f9
[207055.647970] RIP  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.655271]  RSP <ffff880105ff3880>
[207055.665029] ---[ end trace 06a6f0aa8102336a ]---
[207055.671223] Kernel panic - not syncing: Fatal exception
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Sep-24 15:37 UTC

head link

Re: crash in read_extent_buffer+0xb7/0xfb

On Mon, Sep 24, 2012 at 07:41:03AM -0700, Marc MERLIN
wrote:> It''s possible, I have crappy drives that were cheap that
I''m using for tests
> and copies.
Yeah, that makes a good use of crappy disks :)
> Considering that I was doing a huge copy to a brtfs filesystem (source was
> ext4) and that I was using crappy drives in a 5 drives configuration
> with no redundancy since there is no raid5 yet, it''s very
possible.
Well, in your case raid1 might not be enough to protect the data.
>    0:   b7 6d                   mov    $0x6d,%bh
>    2:   db b6 6d db b6 6d       (bad)  0x6db6db6d(%rsi)
>    8:   49 bd 00 00 00 00 00    movabs $0xffff880000000000,%r13
>    f:   88 ff ff 
>   12:   49 c1 e0 03             shl    $0x3,%r8
>   16:   eb 43                   jmp    0x5b
>   18:   48 8b 8b 50 01 00 00    mov    0x150(%rbx),%rcx
>   1f:   4c 89 d0                mov    %r10,%rax
>   22:   48 89 d7                mov    %rdx,%rdi
>   25:   4c 29 f8                sub    %r15,%rax
>   28:   4c 39 e0                cmp    %r12,%rax
>   2b:*  4a 8b 0c 01             mov    (%rcx,%r8,1),%rcx     <--
trapping instruction
ffff8800405ba2c8 + 007ffffffd4ebdc8 = 1007f88003daa6090 and overflows 64bit

I''m afraid this does not tell much of the story. The last function that
is not a struct helper was leaf_space_used(), via push_leaf_right,
split_leaf() from btrfs_search_slot -- all sanity chcecks I see are past
any of those calls, so it''s probably corrupted on-disk.

The call stack is unfortunatelly deep and going backwards in assembly to
track where R11 could get set is tedious.

Did you see any other messages in the log? If you could recreate the
filesystem and workload, doing a fsck occasionally may narrow down the
surface for analysis. Otherwise I''m out of ideas now.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2012 - 3.5.3: kernel BUG at fs/btrfs/ctree.c:3451!

3.5.3: kernel BUG at fs/btrfs/ctree.c:3451!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: kernel BUG at fs/btrfs/extent_io.c:1884!

Re: crash in read_extent_buffer+0xb7/0xfb

Re: crash in read_extent_buffer+0xb7/0xfb

Re: crash in read_extent_buffer+0xb7/0xfb

Re: crash in read_extent_buffer+0xb7/0xfb