Hello, Somehow my subvolume with /home got corrupted. When I booted the machine this morning (after perfectly normal shutdown) it gave me a bunch of kernel errors. I found out that if I comment out my /home entry in fstab, it would boot ok. So the / is not corrupted. I then booted from the live CD and set "clear_cache" for /home instead of "inode_cache,space_cache" /dev/disk/by-label/btrfs-root / btrfs defaults,noatime,inode_cache,space_cache 0 0 /dev/disk/by-label/btrfs-root /var/lib/btrfs-root btrfs defaults,noatime,subvolid=0 0 0 #/dev/disk/by-label/btrfs-root /home btrfs defaults,noatime,subvol=__home-new,inode_cache,space_cache 0 0 /dev/disk/by-label/btrfs-root /home btrfs defaults,noatime,subvol=__home-new,clear_cache 0 0 /var/lib/btrfs-root/boot /boot none bind 0 0 Then I could mount the /home subvolume. I also found the corrupted file ? -????????? ? ? ? ? ? 13.4.4.40.js Whenever I try to access it I am getting Input/output error and the following error in the kernel.log Oct 10 10:38:03 yukikaze kernel: [34592.275080] parent transid verify failed on 105930436608 wanted 58565 found 134248 Oct 10 10:38:03 yukikaze kernel: [34592.275161] BUG: scheduling while atomic: ls/2545/0x00000002 Oct 10 10:38:03 yukikaze kernel: [34592.275166] Modules linked in: ipv6 loop usb_storage uas radeon snd_hda_codec_hdmi ttm snd_hda_codec_via drm_kms_helper ppdev sg snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd edac_core soundcore sp5100_tco r8169 drm firewire_ohci firewire_core i2c_algo_bit i2c_piix4 i2c_core edac_mce_amd parport_pc shpchp parport pci_hotplug pcspkr evdev mii serio_raw k10temp psmouse asus_atk0110 snd_page_alloc crc_itu_t wmi button powernow_k8 processor mperf sr_mod cdrom sd_mod pata_acpi usbhid hid ohci_hcd pata_atiixp ahci libahci libata ehci_hcd scsi_mod usbcore Oct 10 10:38:03 yukikaze kernel: [34592.275268] Pid: 2545, comm: ls Not tainted 3.0.6-aya1 #3 Oct 10 10:38:03 yukikaze kernel: [34592.275273] Call Trace: Oct 10 10:38:03 yukikaze kernel: [34592.275288] [<ffffffff8143fd33>] __schedule_bug+0x5f/0x64 Oct 10 10:38:03 yukikaze kernel: [34592.275298] [<ffffffff81447c89>] __schedule+0x7c9/0x980 Oct 10 10:38:03 yukikaze kernel: [34592.275310] [<ffffffff812705e7>] ? submit_bio+0x87/0x110 Oct 10 10:38:03 yukikaze kernel: [34592.275320] [<ffffffff81009e29>] ? read_tsc+0x9/0x20 Oct 10 10:38:03 yukikaze kernel: [34592.275329] [<ffffffff8107e7bd>] ? ktime_get_ts+0xad/0xe0 Oct 10 10:38:03 yukikaze kernel: [34592.275338] [<ffffffff810eb550>] ? __lock_page+0x70/0x70 Oct 10 10:38:03 yukikaze kernel: [34592.275346] [<ffffffff8104ac6f>] schedule+0x3f/0x60 Oct 10 10:38:03 yukikaze kernel: [34592.275354] [<ffffffff81447fbf>] io_schedule+0x8f/0xd0 Oct 10 10:38:03 yukikaze kernel: [34592.275362] [<ffffffff810eb55e>] sleep_on_page+0xe/0x20 Oct 10 10:38:03 yukikaze kernel: [34592.275370] [<ffffffff8144876f>] __wait_on_bit+0x5f/0x90 Oct 10 10:38:03 yukikaze kernel: [34592.275379] [<ffffffff810eb748>] wait_on_page_bit+0x78/0x80 Oct 10 10:38:03 yukikaze kernel: [34592.275388] [<ffffffff81074140>] ? autoremove_wake_function+0x40/0x40 Oct 10 10:38:03 yukikaze kernel: [34592.275397] [<ffffffff81210902>] read_extent_buffer_pages+0x412/0x480 Oct 10 10:38:03 yukikaze kernel: [34592.275405] [<ffffffff811e4410>] ? verify_parent_transid+0x240/0x240 Oct 10 10:38:03 yukikaze kernel: [34592.275414] [<ffffffff811e529a>] btree_read_extent_buffer_pages.isra.61+0x8a/0xc0 Oct 10 10:38:03 yukikaze kernel: [34592.275422] [<ffffffff811e6bf1>] read_tree_block+0x41/0x60 Oct 10 10:38:03 yukikaze kernel: [34592.275431] [<ffffffff811cbaab>] read_block_for_search.isra.33+0x1fb/0x500 Oct 10 10:38:03 yukikaze kernel: [34592.275439] [<ffffffff811cb0bd>] ? generic_bin_search.constprop.35+0x17d/0x1f0 Oct 10 10:38:03 yukikaze kernel: [34592.275447] [<ffffffff811cb214>] ? bin_search+0xe4/0x130 Oct 10 10:38:03 yukikaze kernel: [34592.275454] [<ffffffff811ceb48>] btrfs_search_slot+0x358/0x900 Oct 10 10:38:03 yukikaze kernel: [34592.275464] [<ffffffff811e310f>] btrfs_lookup_inode+0x2f/0xa0 Oct 10 10:38:03 yukikaze kernel: [34592.275473] [<ffffffff811f6e38>] btrfs_iget+0x108/0x4d0 Oct 10 10:38:03 yukikaze kernel: [34592.275482] [<ffffffff811e0b7f>] ? btrfs_lookup_dir_item+0xdf/0x110 Oct 10 10:38:03 yukikaze kernel: [34592.275491] [<ffffffff811f78f3>] btrfs_lookup_dentry+0x383/0x480 Oct 10 10:38:03 yukikaze kernel: [34592.275499] [<ffffffff811367b9>] ? kmem_cache_alloc+0x149/0x160 Oct 10 10:38:03 yukikaze kernel: [34592.275508] [<ffffffff811f7a06>] btrfs_lookup+0x16/0x30 Oct 10 10:38:03 yukikaze kernel: [34592.275515] [<ffffffff811561d5>] d_alloc_and_lookup+0x45/0x90 Oct 10 10:38:03 yukikaze kernel: [34592.275524] [<ffffffff811632b5>] ? d_lookup+0x35/0x60 Oct 10 10:38:03 yukikaze kernel: [34592.275531] [<ffffffff81157a3e>] do_lookup+0x29e/0x310 Oct 10 10:38:03 yukikaze kernel: [34592.275538] [<ffffffff811586bc>] path_lookupat+0x11c/0x700 Oct 10 10:38:03 yukikaze kernel: [34592.275546] [<ffffffff81158cd1>] do_path_lookup+0x31/0xc0 Oct 10 10:38:03 yukikaze kernel: [34592.275553] [<ffffffff8115a909>] user_path_at+0x59/0xa0 Oct 10 10:38:03 yukikaze kernel: [34592.275561] [<ffffffff8102f8f0>] ? do_page_fault+0x1c0/0x4d0 Oct 10 10:38:03 yukikaze kernel: [34592.275570] [<ffffffff8114fd64>] vfs_fstatat+0x44/0x70 Oct 10 10:38:03 yukikaze kernel: [34592.275578] [<ffffffff810677fd>] ? do_sigaction+0x12d/0x1f0 Oct 10 10:38:03 yukikaze kernel: [34592.275586] [<ffffffff8114fdcb>] vfs_stat+0x1b/0x20 Oct 10 10:38:03 yukikaze kernel: [34592.275593] [<ffffffff8114ff0a>] sys_newstat+0x1a/0x40 Oct 10 10:38:03 yukikaze kernel: [34592.275601] [<ffffffff81067bcd>] ? sys_rt_sigaction+0x8d/0xc0 Oct 10 10:38:03 yukikaze kernel: [34592.275610] [<ffffffff8144b055>] ? page_fault+0x25/0x30 Oct 10 10:38:03 yukikaze kernel: [34592.275617] [<ffffffff8144b602>] system_call_fastpath+0x16/0x1b My question - is it possible to delete this rogue file somehow or repair it? I tried to delete the directory that contained it, but got the same Input/output error. Any help is appreciated. I need to mention that I did have the very same error about a couple of months ago with about 30 files getting corrupt this way in my /home. I had to create a new subvolume for /home (__home-new) and restore the missing files from backup. When I tried to delete the corrupted subvolume it gave me a bunch of kernel errors, but when I repeated the command, it completed ok. However, on reboot the space from this subvolume was not recovered. I tried to balance the subvolume after that but after a couple of hours I am getting only the note about 22 extents in my kernel.log Oct 10 11:03:22 yukikaze kernel: [36111.396313] btrfs: found 22 extents Oct 10 11:03:27 yukikaze kernel: [36116.922236] btrfs: found 22 extents Oct 10 11:03:33 yukikaze kernel: [36122.922488] btrfs: found 22 extents and no relocation messages. So I think it go stuck ( thanks ~dima --- archlinux Linux yukikaze 3.0.6-aya1 #3 SMP PREEMPT Sat Oct 8 19:01:41 JST 2011 x86_64 AMD Athlon(tm) II X4 635 Processor AuthenticAMD GNU/Linux the latest btrfs-tools -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Mon, Oct 10, 2011 at 02:14:26AM +0000, dima wrote:> Somehow my subvolume with /home got corrupted. When I booted the machine this > morning (after perfectly normal shutdown) it gave me a bunch of kernel errors.That''s very strange, if it was a pefrectly normal shutdown, I don''t see a way how could happen. External disk damage, bad RAM would seem a as convenient excuse :)> I found out that if I comment out my /home entry in fstab, it would > boot ok. So the / is not corrupted. I then booted from the live CD and > set "clear_cache" for /home instead of "inode_cache,space_cache" > > /dev/disk/by-label/btrfs-root / btrfs > defaults,noatime,inode_cache,space_cache 0 0 > /dev/disk/by-label/btrfs-root /var/lib/btrfs-root btrfs > defaults,noatime,subvolid=0 0 0 > #/dev/disk/by-label/btrfs-root /home btrfs > defaults,noatime,subvol=__home-new,inode_cache,space_cache 0 0 > /dev/disk/by-label/btrfs-root /home btrfs > defaults,noatime,subvol=__home-new,clear_cache 0 0 > /var/lib/btrfs-root/boot /boot none bind 0 0 > > Then I could mount the /home subvolume. > > I also found the corrupted file > ? -????????? ? ? ? ? ? 13.4.4.40.jsChromium cache? Somebody recently reported a problem there. I wonder what this browser does to the filesystem ... :)> Whenever I try to access it I am getting Input/output error and the following > error in the kernel.log > > > Oct 10 10:38:03 yukikaze kernel: [34592.275080] parent transid verify failed on > 105930436608 wanted 58565 found 134248 > Oct 10 10:38:03 yukikaze kernel: [34592.275161] BUG: scheduling while atomic: > ls/2545/0x00000002This bug is in most cases only a consequence of some btrfs BUG_ON, please try to find it in your logs or reproduce the problem. The ''parent transid verify'' problem may cause a BUG_ON up in the caller stack.> My question - is it possible to delete this rogue file somehow or repair it? > I tried to delete the directory that contained it, but got the same Input/output > error.Fsck for the rescue! Or, you can try Josef''s repair [1] proggy to retrieve the data from the volume (AFAIK it should work around the parent transid problem). If all other files are fine, you can rebuild the /home from that. david [1] git://github.com/josefbacik/btrfs-progs.git -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks David, The last shutdown was clean, but I had to powercycle several times this month. I am also mounting a swapfile via loop device, so maybe this also adds up to instability. The corrupt file is a firefox source file (mozilla-central/js/src/tests/e4x/XML/13.4.4.40.js). Interesting thing that I did not touch this file or rebuild firefox for about 3-4 days, so I do not have any idea why it got corrupted suddenly. When trying to remove the directory containing this file I am getting: Oct 10 14:03:13 yukikaze kernel: [ 9836.993172] ------------[ cut here ]------------ Oct 10 14:03:13 yukikaze kernel: [ 9836.993261] kernel BUG at fs/btrfs/inode.c:3024! Oct 10 14:03:13 yukikaze kernel: [ 9836.993340] invalid opcode: 0000 [#1] PREEMPT SMP Oct 10 14:03:13 yukikaze kernel: [ 9836.993438] CPU 0 Oct 10 14:03:13 yukikaze kernel: [ 9836.993474] Modules linked in: reiserfs usb_storage uas ipv6 loop snd_hda_codec_hdmi snd_hda_codec_via sg snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd sp5100_tco i2c_piix4 radeon ttm drm_kms_helper drm i2c_algo_bit firewire_ohci psmouse ppdev shpchp evdev serio_raw pcspkr firewire_core pci_hotplug i2c_core edac_core soundcore snd_page_alloc asus_atk0110 k10temp edac_mce_amd parport_pc parport crc_itu_t r8169 mii button wmi powernow_k8 processor mperf usbhid hid sr_mod cdrom sd_mod pata_acpi ohci_hcd ehci_hcd pata_atiixp ahci libahci libata scsi_mod usbcore Oct 10 14:03:13 yukikaze kernel: [ 9836.994630] Oct 10 14:03:13 yukikaze kernel: [ 9836.994662] Pid: 3043, comm: rm Not tainted 3.0.6-aya1 #3 System manufacturer System Product Name/M4A785TD-V EVO Oct 10 14:03:13 yukikaze kernel: [ 9836.994840] RIP: 0010:[<ffffffff811f5221>] [<ffffffff811f5221>] btrfs_unlink+0xd1/0xe0 Oct 10 14:03:13 yukikaze kernel: [ 9836.994983] RSP: 0018:ffff8800a616fe28 EFLAGS: 00010282 Oct 10 14:03:13 yukikaze kernel: [ 9836.995070] RAX: 00000000fffffffe RBX: ffff8801178f6240 RCX: 000000000331d8c0 Oct 10 14:03:13 yukikaze kernel: [ 9836.995185] RDX: 000000000331d880 RSI: 0000000000018dc0 RDI: ffffea0003d28130 Oct 10 14:03:13 yukikaze kernel: [ 9836.995301] RBP: ffff8800a616fe58 R08: ffffffff811c7dda R09: 0000000000000000 Oct 10 14:03:13 yukikaze kernel: [ 9836.995416] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000fffffffe Oct 10 14:03:13 yukikaze kernel: [ 9836.995530] R13: ffff880096fb05c8 R14: ffff8801186ad800 R15: ffff8800426bbf88 Oct 10 14:03:13 yukikaze kernel: [ 9836.995646] FS: 00007f54a0d6e700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 Oct 10 14:03:13 yukikaze kernel: [ 9836.995777] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 10 14:03:13 yukikaze kernel: [ 9836.995870] CR2: 0000000001ddf0b8 CR3: 00000001081d9000 CR4: 00000000000006f0 Oct 10 14:03:13 yukikaze kernel: [ 9836.995984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 10 14:03:13 yukikaze kernel: [ 9836.996099] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Process rm (pid: 3043, threadinfo ffff8800a616e000, task ffff8800967e1d00) Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Stack: Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] 0000000000000000 ffff880012f8b300 0000000000000000 ffff880096fb05c8 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] 0000000000000000 0000000000000003 ffff8800a616fe88 ffffffff8115a42f Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] ffff8800a616fe88 ffff880012f8b300 ffff8800426bbf88 0000000000000000 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Call Trace: Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115a42f>] vfs_unlink+0x9f/0x110 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115a63a>] do_unlinkat+0x19a/0x1c0 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff811496b6>] ? filp_close+0x66/0x90 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115b332>] sys_unlinkat+0x22/0x40 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8144b602>] system_call_fastpath+0x16/0x1b Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Code: 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 4c 89 fe 48 89 df e8 e5 cd ff ff 85 c0 74 b8 0f 0b <0f> 0b 41 89 c4 eb c9 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] RIP [<ffffffff811f5221>] btrfs_unlink+0xd1/0xe0 Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] RSP <ffff8800a616fe28> Oct 10 14:03:13 yukikaze kernel: [ 9837.023860] ---[ end trace 771cebd6df5534bd ]--- I did btrfsck with the latest btrfs-tools After item 33 key (150121906176 EXTENT_ITEM 4096) itemoff 2234 itemsize 51 extent refs 1 gen 33099 flags 2 tree block key (1215402 1 0) level 0 tree block backref root 257 (i.e. very early, about 4-5 seconds after I started checking) it gave me an error failed to find block number 150121762816 Unless I touch this file, the FS is fully functional. Yes, I can create a new subvolume of course, but as I mentioned before, there is a big chance that the corrupted one will not be deleted cleanly and my disk gets bloated even more with junk data I can do nothing about. thanks ~dima -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 10, 2011 at 11:03:34AM +0000, dima wrote:> The last shutdown was clean, but I had to powercycle several times this month. > I am also mounting a swapfile via loop device, so maybe this also adds up to > instability. > > The corrupt file is a firefox source file > (mozilla-central/js/src/tests/e4x/XML/13.4.4.40.js). Interesting thing that I > did not touch this file or rebuild firefox for about 3-4 days, so I do not have > any idea why it got corrupted suddenly. > > When trying to remove the directory containing this file I am getting: > > Oct 10 14:03:13 yukikaze kernel: [ 9836.993172] ------------[ cut here > ]------------ > Oct 10 14:03:13 yukikaze kernel: [ 9836.993261] kernel BUG at > fs/btrfs/inode.c:3024!fixed by: commit b532402e4d147e4f409c4e7f50d4413e8450101d Author: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Date: Tue Jul 19 07:27:20 2011 +0000 Btrfs: return error to caller when btrfs_unlink() failes When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink() are error, the error code is returned to the caller instead of BUG_ON(). david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba wrote:>> Then I could mount the /home subvolume. >> >> I also found the corrupted file >>? -????????? ? ? ? ? ? 13.4.4.40.js > > Chromium cache? Somebody recently reported a problem there. I wonder > what this browser does to the filesystem ... :)If you meant me by "someone": No, my problem was not related to chromium usage - the problems only raised there because of a previous "cp --reflink" issue while I continued browsing. ;-) So it is pure coincidence because browser caches are a probable destination for write access while my system came to a complete halt due to a browsing- unrelated file operation (cp --reflink). Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oh, I see. The fix is not in 3.0.x but on the master branch. I will need the latest 3.1 RC. I will try this. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have upgraded to 3.1 rc8. I created a new subvolume for /home, copied the files there from the old subvolume and deleted the old subvolume. It looks like the space has been reclaimed fine. Though when doing btrfsck I am still getting the same error failed to find block number 150121762816 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html