Remco Hosman
2012-Apr-20 18:41 UTC
kernel bug in 3.4.0-rc3 after disconnecting/reconnecting drives
I managed to brake my test filesystem after several times disconnecting 1 disk, reconnecting it, disconnecting all disks at the same time (unplugging the USB enclosure they are in) and things like that. when i try to mount, i get this error in dmesg: [ 46.645732] btrfs: use zlib compression [ 46.645765] btrfs: disk space caching is enabled [ 46.772220] parent transid verify failed on 2506608713728 wanted 31547 found 14280 [ 46.772270] failed mirror was 0 [ 46.773025] parent transid verify failed on 2506608713728 wanted 31547 found 14280 [ 46.773060] failed mirror was 0 [ 46.785667] ------------[ cut here ]------------ [ 46.785695] kernel BUG at fs/btrfs/extent_io.c:1890! [ 46.785719] invalid opcode: 0000 [#1] PREEMPT SMP [ 46.785756] CPU 0 [ 46.785768] Modules linked in: btrfs zlib_deflate crc32c libcrc32c nouveau video mxm_wmi wmi drm_kms_helper ttm drm nvidiafb vgastate skge forcedeth powernow_k8 serio_raw pcspkr mperf microcode evdev ns558 gameport i2c_nforce2 i2c_core thermal button processor fan usbhid hid sd_mod pata_amd usb_storage pata_acpi sata_sil ata_generic ohci_hcd sata_sil24 sata_nv libata scsi_mod ehci_hcd usbcore usb_common [ 46.786186] [ 46.786204] Pid: 648, comm: mount Not tainted 3.4.0-rc3-RH #1 System manufacturer System name/A8N-SLI DELUXE [ 46.786266] RIP: 0010:[<ffffffffa042a86f>] [<ffffffffa042a86f>] repair_io_failure+0x17f/0x1c0 [btrfs] [ 46.786368] RSP: 0018:ffff8800b8fab9f8 EFLAGS: 00010246 [ 46.786395] RAX: ffff8800b8faba28 RBX: 000002479d85a000 RCX: 000002479d85a000 [ 46.786428] RDX: 0000000000001000 RSI: 000002479d85a000 RDI: ffff8800b9dc4108 [ 46.786461] RBP: ffff8800b8faba68 R08: ffffea0002e7c380 R09: 0000000000000000 [ 46.786493] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001000 [ 46.786525] R13: ffffea0002e7c380 R14: ffff8800b9dc4108 R15: 0000000000000000 [ 46.786558] FS: 00007f66a1134740(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 46.786598] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 46.786626] CR2: 00007f9487dbb000 CR3: 00000000b9f01000 CR4: 00000000000007f0 [ 46.786659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 46.786691] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 46.786724] Process mount (pid: 648, threadinfo ffff8800b8faa000, task ffff8800b8b9bf80) [ 46.786762] Stack: [ 46.786779] ffff8800b8fab9f8 000002479d85a000 0000000000000000 0000000000000000 [ 46.786840] 0000000000000000 00000000783f0000 ffff8800b8faba28 ffff8800b8faba28 [ 46.786899] 0000000000000000 000002479d85a000 0000000000000000 ffff8800b9dc4108 [ 46.786957] Call Trace: [ 46.787001] [<ffffffffa042b212>] repair_eb_io_failure+0x82/0xb0 [btrfs] [ 46.787053] [<ffffffffa0401252>] btree_read_extent_buffer_pages.constprop.111+0x102/0x130 [btrfs] [ 46.787117] [<ffffffffa0401a3a>] read_tree_block+0x3a/0x50 [btrfs] [ 46.787168] [<ffffffffa04054f2>] open_ctree+0x12c2/0x1ad0 [btrfs] [ 46.787206] [<ffffffff812a30ba>] ? disk_name+0xba/0xc0 [ 46.787248] [<ffffffffa03e2846>] btrfs_mount+0x5b6/0x6a0 [btrfs] [ 46.787283] [<ffffffff8114dfe0>] ? alloc_pages_current+0xb0/0x120 [ 46.787317] [<ffffffff81172193>] mount_fs+0x43/0x1b0 [ 46.787347] [<ffffffff8118c310>] vfs_kern_mount+0x70/0x100 [ 46.787378] [<ffffffff8118c834>] do_kern_mount+0x54/0x110 [ 46.787410] [<ffffffff8118e11a>] do_mount+0x26a/0x850 [ 46.787442] [<ffffffff81110c3e>] ? __get_free_pages+0xe/0x50 [ 46.787473] [<ffffffff8118dd1a>] ? copy_mount_options+0x3a/0x180 [ 46.787505] [<ffffffff8118e83d>] sys_mount+0x8d/0xe0 [ 46.787535] [<ffffffff814b9929>] system_call_fastpath+0x16/0x1b [ 46.787564] Code: 82 d7 e0 b8 fb ff ff ff 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 b8 fb ff ff ff eb dd <0f> 0b 0f 0b 49 8b 45 08 49 8b 8f 88 00 00 00 4d 89 f0 48 8b 55 [ 46.787993] RIP [<ffffffffa042a86f>] repair_io_failure+0x17f/0x1c0 [btrfs] [ 46.788053] RSP <ffff8800b8fab9f8> [ 46.788111] ---[ end trace c2c3c0f7ca538d25 ]--- Then i tried btrfsck (current git pull from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git ) it did output a LOT of data (i have the output, 8.3 meg, 370k in xz format, 544k in bzip2). to give a few ''highlights'', it started with a lot of things like this: parent transid verify failed on 2506608713728 wanted 31547 found 14280 parent transid verify failed on 2506608713728 wanted 31547 found 14280 --- then some: bad block 2506607267840 leaf parent key incorrect 2506608812032 bad block 2506608812032 --- some like this: parent transid verify failed on 2506600460288 wanted 31542 found 14288 Ignoring transid failure parent transid verify failed on 2506605555712 wanted 31545 found 14280 --- then some: ref mismatch on [2506472095744 4096] extent item 1, found 0 Backref 2506472095744 root 2 not referenced back 0x2c6cdc0 Incorrect global backref count on 2506472095744 found 1 wanted 0 backpointer mismatch on [2506472095744 4096] owner ref check failed [2506472095744 4096] -- and: ackref 2506995310592 parent 2524764864512 not referenced back 0x89971b0 Incorrect global backref count on 2506995310592 found 1 wanted 0 backpointer mismatch on [2506995310592 4096] owner ref check failed [2506995310592 4096] ref mismatch on [2506995359744 4096] extent item 1, found 0 --- and ended with: checking root refs found 2341466357760 bytes used err is 0 total csum bytes: 2243015812 total tree bytes: 3809906688 total fs tree bytes: 766795776 btree space waste bytes: 626726388 file data blocks allocated: 2295997747200 referenced 2316898037760 Btrfs Btrfs v0.19 Kernel version is 3.4.0-rc3. output from `btrfs file show` : Label: none uuid: 24779492-902d-4ba5-8807-ed18e86033cf Total devices 6 FS bytes used 2.13TB devid 9 size 465.76GB used 4.00GB path /dev/sdg devid 2 size 1.36TB used 1.16TB path /dev/sdc devid 7 size 465.76GB used 380.76GB path /dev/sdf devid 5 size 1.36TB used 1.16TB path /dev/sdd devid 6 size 465.76GB used 380.76GB path /dev/sde devid 8 size 2.73TB used 2.52TB path /dev/sda So: 1) how can i help locating this issue? 2) should i keep the filesystem or format and start over to reproduce the error? 3) anything else i can try to fix the filesystem? not that i need the data, it would be just to test if recovery would be possible. Remco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html