Hello, I have noticed that my server experiences high load average when writing to it. So I checked the file-system and found errors: ./btrfsck /dev/sdc1 Checking filesystem on /dev/sdc1 UUID: 989306aa-d291-4752-8477-0baf94f8c42f checking extents checking free space cache checking fs roots root 256 inode 9579 errors 100 root 256 inode 9580 errors 100 root 256 inode 14258 errors 100 root 256 inode 14259 errors 100 root 4444 inode 9579 errors 100 root 4444 inode 9580 errors 100 root 4444 inode 14258 errors 100 root 4444 inode 14259 errors 100 found 1478386534452 bytes used err is 1 total csum bytes: 3207847732 total tree bytes: 3902853120 total fs tree bytes: 38875136 total extent tree bytes: 135856128 btree space waste bytes: 411653937 file data blocks allocated: 3426722545664 referenced 3426000965632 Btrfs v0.20-rc1-358-g194aa4a It is a system striped over two physical disks. Now, what concerns me is that I found no indications of problems except for the performance whatsoever. Nothing in the syslog. Now that I am searching, I see this in dmesg: [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] [95764.899384] [<ffffffffa00dd653>] btrfs_drop_and_free_fs_root+0x93/0xc0 [btrfs] [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs] Now the fact that the load went up indicates to me that the system struggled reading or writing. Can''t this "struggeling" be detected and reported? Wouldn''t this contribute to data-safety? An the for me now more pressing question: How can I fix the problem? Greetings, Hendrik -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Now that I am searching, I see this in dmesg: > [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] > [95764.899384] [<ffffffffa00dd653>] btrfs_drop_and_free_fs_root+0x93/0xc0 > [btrfs] > [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] > [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] > [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] > [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs]Need to see the rest of the trace this came from. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, sorry about that: [ 126.444603] init: plymouth-stop pre-start process (3446) terminated with status 1 [11189.299864] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. [94999.489736] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140408 /dev/sdc1 [94999.489755] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140408 /dev/sdb1 [95394.400840] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140420 /dev/sdb1 [95394.400872] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140420 /dev/sdc1 [95585.149738] init: smbd main process (1168) killed by TERM signal [95725.171156] nfsd: last server has exited, flushing export cache [95764.899173] ------------[ cut here ]------------ [95764.899216] WARNING: CPU: 1 PID: 21798 at /home/apw/COD/linux/fs/btrfs/disk-io.c:3423 free_fs_root+0x99/0xa 0 [btrfs]() [95764.899219] Modules linked in: nvram pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) ip6tabl e_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_de frag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tab les bridge stp llc kvm_intel kvm nfsd nfs_acl auth_rpcgss nfs fscache binfmt_misc lockd sunrpc ftdi_sio usbser ial stv6110x lnbp21 snd_hda_codec_realtek snd_hda_intel stv090x snd_hda_codec snd_hwdep snd_pcm ddbridge dvb_c ore snd_timer snd soundcore snd_page_alloc cxd2099(C) mei_me psmouse i915 drm_kms_helper mei drm lpc_ich i2c_a lgo_bit serio_raw video mac_hid coretemp lp parport hid_generic usbhid hid btrfs raid6_pq e1000e ptp pps_core ahci libahci xor zlib_deflate libcrc32c [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GF CIO 3.11.0-031100rc2-generic #201307211535 [95764.899297] Hardware name: /DH87RL, BIOS RLH8710H.86A.0320.2013.0606.1802 06/06/2013 [95764.899300] 0000000000000d5f ffff880118b59cb8 ffffffff8171e74d 0000000000000007 [95764.899306] 0000000000000000 ffff880118b59cf8 ffffffff8106532c ffff880118b59d08 [95764.899311] ffff8801184cb800 ffff8801184cb800 ffff880118118000 ffff880118b59d78 [95764.899315] Call Trace: [95764.899324] [<ffffffff8171e74d>] dump_stack+0x46/0x58 [95764.899331] [<ffffffff8106532c>] warn_slowpath_common+0x8c/0xc0 [95764.899336] [<ffffffff8106537a>] warn_slowpath_null+0x1a/0x20 [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] [95764.899384] [<ffffffffa00dd653>] btrfs_drop_and_free_fs_root+0x93/0xc0 [btrfs] [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] [95764.899441] [<ffffffff811cd24e>] ? evict_inodes+0xce/0x130 [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] [95764.899467] [<ffffffff811b47e2>] generic_shutdown_super+0x62/0xf0 [95764.899475] [<ffffffff811b4906>] kill_anon_super+0x16/0x30 [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs] [95764.899500] [<ffffffff811b512d>] deactivate_locked_super+0x4d/0x80 [95764.899505] [<ffffffff811b57ae>] deactivate_super+0x4e/0x70 [95764.899510] [<ffffffff811d1266>] mntput_no_expire+0x106/0x160 [95764.899515] [<ffffffff811d2b79>] SyS_umount+0xa9/0xf0 [95764.899520] [<ffffffff817333ef>] tracesys+0xe1/0xe6 [95764.899524] ---[ end trace 0024dfebf572e76c ]--- [95764.985245] VFS: Busy inodes after unmount of sdb1. Self-destruct in 5 seconds. Have a nice day... [95790.079663] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140425 /dev/sdb1 [95790.101778] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140425 /dev/sdc1 [95790.162960] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140425 /dev/sdb1 [95790.163825] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140425 /dev/sdc1 [95924.393344] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140425 /dev/sdb1 [95924.421118] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140425 /dev/sdc1 [95924.676571] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140425 /dev/sdb1 [95924.677046] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140425 /dev/sdc1 Greetings, Hendrik Am 02.11.2013 09:12, schrieb cwillu:>> Now that I am searching, I see this in dmesg: >> [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] >> [95764.899384] [<ffffffffa00dd653>] btrfs_drop_and_free_fs_root+0x93/0xc0 >> [btrfs] >> [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] >> [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] >> [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] >> [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs] > > Need to see the rest of the trace this came from. >-- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, the list was quite full with patches, so this might have been hidden. Here the complete Stack. Does this help? Is this what you needed? Greetings, Hendrik> sorry about that: > [ 126.444603] init: plymouth-stop pre-start process (3446) terminated > with status 1 > [11189.299864] hda-intel: IRQ timing workaround is activated for card > #0. Suggest a bigger bdl_pos_adj. > [94999.489736] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140408 /dev/sdc1 > [94999.489755] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140408 /dev/sdb1 > [95394.400840] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140420 /dev/sdb1 > [95394.400872] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140420 /dev/sdc1 > [95585.149738] init: smbd main process (1168) killed by TERM signal > [95725.171156] nfsd: last server has exited, flushing export cache > [95764.899173] ------------[ cut here ]------------ > [95764.899216] WARNING: CPU: 1 PID: 21798 at > /home/apw/COD/linux/fs/btrfs/disk-io.c:3423 free_fs_root+0x99/0xa > 0 [btrfs]() > [95764.899219] Modules linked in: nvram pci_stub vboxpci(OF) > vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) ip6tabl > e_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat > nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_de > frag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle > xt_tcpudp iptable_filter ip_tables x_tab > les bridge stp llc kvm_intel kvm nfsd nfs_acl auth_rpcgss nfs fscache > binfmt_misc lockd sunrpc ftdi_sio usbser > ial stv6110x lnbp21 snd_hda_codec_realtek snd_hda_intel stv090x > snd_hda_codec snd_hwdep snd_pcm ddbridge dvb_c > ore snd_timer snd soundcore snd_page_alloc cxd2099(C) mei_me psmouse > i915 drm_kms_helper mei drm lpc_ich i2c_a > lgo_bit serio_raw video mac_hid coretemp lp parport hid_generic usbhid > hid btrfs raid6_pq e1000e ptp pps_core > ahci libahci xor zlib_deflate libcrc32c > [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GF CIO > 3.11.0-031100rc2-generic #201307211535 > [95764.899297] Hardware name: /DH87RL, BIOS > RLH8710H.86A.0320.2013.0606.1802 06/06/2013 > [95764.899300] 0000000000000d5f ffff880118b59cb8 ffffffff8171e74d > 0000000000000007 > [95764.899306] 0000000000000000 ffff880118b59cf8 ffffffff8106532c > ffff880118b59d08 > [95764.899311] ffff8801184cb800 ffff8801184cb800 ffff880118118000 > ffff880118b59d78 > [95764.899315] Call Trace: > [95764.899324] [<ffffffff8171e74d>] dump_stack+0x46/0x58 > [95764.899331] [<ffffffff8106532c>] warn_slowpath_common+0x8c/0xc0 > [95764.899336] [<ffffffff8106537a>] warn_slowpath_null+0x1a/0x20 > [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] > [95764.899384] [<ffffffffa00dd653>] > btrfs_drop_and_free_fs_root+0x93/0xc0 [btrfs] > [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] > [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] > [95764.899441] [<ffffffff811cd24e>] ? evict_inodes+0xce/0x130 > [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] > [95764.899467] [<ffffffff811b47e2>] generic_shutdown_super+0x62/0xf0 > [95764.899475] [<ffffffff811b4906>] kill_anon_super+0x16/0x30 > [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs] > [95764.899500] [<ffffffff811b512d>] deactivate_locked_super+0x4d/0x80 > [95764.899505] [<ffffffff811b57ae>] deactivate_super+0x4e/0x70 > [95764.899510] [<ffffffff811d1266>] mntput_no_expire+0x106/0x160 > [95764.899515] [<ffffffff811d2b79>] SyS_umount+0xa9/0xf0 > [95764.899520] [<ffffffff817333ef>] tracesys+0xe1/0xe6 > [95764.899524] ---[ end trace 0024dfebf572e76c ]--- > [95764.985245] VFS: Busy inodes after unmount of sdb1. Self-destruct in > 5 seconds. Have a nice day... > [95790.079663] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140425 /dev/sdb1 > [95790.101778] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140425 /dev/sdc1 > [95790.162960] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140425 /dev/sdb1 > [95790.163825] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140425 /dev/sdc1 > [95924.393344] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140425 /dev/sdb1 > [95924.421118] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140425 /dev/sdc1 > [95924.676571] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 > transid 140425 /dev/sdb1 > [95924.677046] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 > transid 140425 /dev/sdc1 > > > Greetings, > Hendrik > > > Am 02.11.2013 09:12, schrieb cwillu: >>> Now that I am searching, I see this in dmesg: >>> [95764.899359] [<ffffffffa00d9a59>] free_fs_root+0x99/0xa0 [btrfs] >>> [95764.899384] [<ffffffffa00dd653>] >>> btrfs_drop_and_free_fs_root+0x93/0xc0 >>> [btrfs] >>> [95764.899408] [<ffffffffa00dd74f>] del_fs_roots+0xcf/0x130 [btrfs] >>> [95764.899433] [<ffffffffa00ddac6>] close_ctree+0x146/0x270 [btrfs] >>> [95764.899461] [<ffffffffa00b4eb9>] btrfs_put_super+0x19/0x20 [btrfs] >>> [95764.899493] [<ffffffffa00b754a>] btrfs_kill_super+0x1a/0x90 [btrfs] >> >> Need to see the rest of the trace this came from. >> > >-- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 4, 2013 at 3:14 PM, Hendrik Friedel <hendrik@friedels.name> wrote:> Hello, > > the list was quite full with patches, so this might have been hidden. > Here the complete Stack. > Does this help? Is this what you needed? >> [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GF CIO >> 3.11.0-031100rc2-generic #201307211535Can you reproduce the problem under the released 3.11 or 3.12? An -rc2 is still pretty early in the release cycle, and I wouldn''t be at all surprised if it was a bug added and fixed in a later rc. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, sorry, I was totally unaware still being on 3.11rc2. I re-ran btrfsck with the same result: ./btrfs-progs/btrfsck /dev/sdc1 Checking filesystem on /dev/sdc1 UUID: 989306aa-d291-4752-8477-0baf94f8c42f checking extents checking free space cache checking fs roots root 256 inode 9579 errors 100 root 256 inode 9580 errors 100 root 256 inode 14258 errors 100 root 256 inode 14259 errors 100 root 4444 inode 9579 errors 100 root 4444 inode 9580 errors 100 root 4444 inode 14258 errors 100 root 4444 inode 14259 errors 100 found 1992865028914 bytes used err is 1 total csum bytes: 3207847732 total tree bytes: 3902865408 total fs tree bytes: 38875136 total extent tree bytes: 135864320 btree space waste bytes: 411665032 file data blocks allocated: 3426722545664 referenced 3426000965632 Btrfs v0.20-rc1-358-g194aa4a Now dmesg and the syslog stay clear of entries relatet to btrfs. But I think, that might also be a coincidence: I ran the old Kernel for weeks until this error came, whereas I ran this kernel merely 12h. Now: Does it make sense to futher try to find a possible bug, or do we suspect it is fixed? If so: How can I help? And: Can I fix these Problems safely with btrfsck? Regards, Hendrik Am 05.11.2013 03:03, schrieb cwillu:> On Mon, Nov 4, 2013 at 3:14 PM, Hendrik Friedel <hendrik@friedels.name> wrote: >> Hello, >> >> the list was quite full with patches, so this might have been hidden. >> Here the complete Stack. >> Does this help? Is this what you needed? >>> [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GF CIO >>> 3.11.0-031100rc2-generic #201307211535 > > Can you reproduce the problem under the released 3.11 or 3.12? An > -rc2 is still pretty early in the release cycle, and I wouldn''t be at > all surprised if it was a bug added and fixed in a later rc. >-- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello again, can someone please help me on this? Regards, Hendrik Am 06.11.2013 07:45, schrieb Hendrik Friedel:> Hello, > > sorry, I was totally unaware still being on 3.11rc2. > > I re-ran btrfsck with the same result: > ./btrfs-progs/btrfsck /dev/sdc1 > Checking filesystem on /dev/sdc1 > UUID: 989306aa-d291-4752-8477-0baf94f8c42f > checking extents > checking free space cache > checking fs roots > root 256 inode 9579 errors 100 > root 256 inode 9580 errors 100 > root 256 inode 14258 errors 100 > root 256 inode 14259 errors 100 > root 4444 inode 9579 errors 100 > root 4444 inode 9580 errors 100 > root 4444 inode 14258 errors 100 > root 4444 inode 14259 errors 100 > found 1992865028914 bytes used err is 1 > total csum bytes: 3207847732 > total tree bytes: 3902865408 > total fs tree bytes: 38875136 > total extent tree bytes: 135864320 > btree space waste bytes: 411665032 > file data blocks allocated: 3426722545664 > referenced 3426000965632 > Btrfs v0.20-rc1-358-g194aa4a > > > Now dmesg and the syslog stay clear of entries relatet to btrfs. > But I think, that might also be a coincidence: I ran the old Kernel for > weeks until this error came, whereas I ran this kernel merely 12h. > > Now: > Does it make sense to futher try to find a possible bug, or do we > suspect it is fixed? If so: How can I help? > > And: > Can I fix these Problems safely with btrfsck? > > Regards, > Hendrik > > Am 05.11.2013 03:03, schrieb cwillu: >> On Mon, Nov 4, 2013 at 3:14 PM, Hendrik Friedel >> <hendrik@friedels.name> wrote: >>> Hello, >>> >>> the list was quite full with patches, so this might have been hidden. >>> Here the complete Stack. >>> Does this help? Is this what you needed? >>>> [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GF CIO >>>> 3.11.0-031100rc2-generic #201307211535 >> >> Can you reproduce the problem under the released 3.11 or 3.12? An >> -rc2 is still pretty early in the release cycle, and I wouldn''t be at >> all surprised if it was a bug added and fixed in a later rc. >> > >-- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hendrik Friedel posted on Thu, 07 Nov 2013 20:16:34 +0100 as excerpted:> can someone please help me on this?Your replies are upside down (reply before the quoted context in which it should be taken, edited to replied context as appropriate), so I''ve not included further quoted context. To answer the "is it safe to fix" question... The answer is relative. Btrfs itself remains officially an experimental/ development filesystem (as seen in the kernel option enabling it and on the btrfs wiki[1]), suitable only for testing with data that can be lost to the test without serious impact, either because you keep a tested backup of sufficient recency that you''d be comfortable declaring what''s on btrfs a totally irrecoverable loss, or because it''s simply scratch data for testing only in the first place. In that context, yes, it''s safe to btrfsck --repair, because you''re prepared to lose the entire filesystem if worse comes to worse in any case, so even if btrfsck --repair makes things worse instead of better, you''ve not lost anything you''re particularly worried about anyway. If that''s /not/ the case, then you really should be reexamining your choice of btrfs in the first place, as your stability requirements simply are not covered by btrfs at this point. Either choose another filesystem or change your backup practices and thus your stability requirements to be in line with btrfs'' current state. [1] https://btrfs.wiki.kernel.org -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello thanks for your reply.> To answer the "is it safe to fix" question...>> In that context, yes, it''s safe to btrfsck --repair, because you''re > prepared to lose the entire filesystem if worse comes to worse in any > case, so even if btrfsck --repair makes things worse instead of better, > you''ve not lost anything you''re particularly worried about anyway.I do have an daily backup of the important data. There is other data, that is (a bit more than) nice to keep (TV-Recordings). It seems all still readable, so I can also back this up, if I could free some space. So, I have run btrfsck --repair: ------- root@homeserver:~/btrfs/btrfs-progs# git pull remote: Counting objects: 124, done. remote: Compressing objects: 100% (52/52), done. remote: Total 99 (delta 55), reused 89 (delta 47) Unpacking objects: 100% (99/99), done. From git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs d1570a0..c652e4e integration -> origin/integration Already up-to-date. ------- The repair: ------- ./btrfsck --repair /dev/sdc1 enabling repair mode Checking filesystem on /dev/sdc1 UUID: 989306aa-d291-4752-8477-0baf94f8c42f checking extents checking free space cache cache and super generation don''t match, space cache will be invalidated checking fs roots root 256 inode 9579 errors 100 root 256 inode 9580 errors 100 root 256 inode 14258 errors 100 root 256 inode 14259 errors 100 root 4444 inode 9579 errors 100 root 4444 inode 9580 errors 100 root 4444 inode 14258 errors 100 root 4444 inode 14259 errors 100 found 2895817096773 bytes used err is 1 total csum bytes: 3206482672 total tree bytes: 3901480960 total fs tree bytes: 38912000 total extent tree bytes: 135892992 btree space waste bytes: 411727425 file data blocks allocated: 3446512275456 referenced 3445793439744 Btrfs v0.20-rc1-358-g194aa4 ------- After the repair, another check reveals the same errors as before: ------- ./btrfsck /dev/sdc1 Checking filesystem on /dev/sdc1 UUID: 989306aa-d291-4752-8477-0baf94f8c42f checking extents checking free space cache cache and super generation don''t match, space cache will be invalidated checking fs roots root 256 inode 9579 errors 100 root 256 inode 9580 errors 100 root 256 inode 14258 errors 100 root 256 inode 14259 errors 100 root 4444 inode 9579 errors 100 root 4444 inode 9580 errors 100 root 4444 inode 14258 errors 100 root 4444 inode 14259 errors 100 found 2895817096773 bytes used err is 1 total csum bytes: 3206482672 total tree bytes: 3901480960 total fs tree bytes: 38912000 total extent tree bytes: 135892992 btree space waste bytes: 411727425 file data blocks allocated: 3446512275456 referenced 3445793439744 Btrfs v0.20-rc1-358-g194aa4a ------- The only messages in syslog/dmesg regarding btrfs are: [299517.270322] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140436 /dev/sdc1 [299525.805867] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140436 /dev/sdb1 [299525.807148] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 140436 /dev/sdc1 [299525.808277] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1 transid 140436 /dev/sdb1 (repeating several times) Can we find out, why btrfsck does not fix the errors? Greetings, Hendrik -- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, I re-post this: >> To answer the "is it safe to fix" question...> > >> In that context, yes, it''s safe to btrfsck --repair, because you''re >> prepared to lose the entire filesystem if worse comes to worse in any >> case, so even if btrfsck --repair makes things worse instead of better, >> you''ve not lost anything you''re particularly worried about anyway. > > I do have an daily backup of the important data. > There is other data, that is (a bit more than) nice to keep > (TV-Recordings). It seems all still readable, so I can also back this > up, if I could free some space. > > So, I have run btrfsck --repair: > ------- > root@homeserver:~/btrfs/btrfs-progs# git pull > remote: Counting objects: 124, done. > remote: Compressing objects: 100% (52/52), done. > remote: Total 99 (delta 55), reused 89 (delta 47) > Unpacking objects: 100% (99/99), done. > From git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs > d1570a0..c652e4e integration -> origin/integration > Already up-to-date. > ------- > > > The repair: > ------- > ./btrfsck --repair /dev/sdc1 > enabling repair mode > Checking filesystem on /dev/sdc1 > UUID: 989306aa-d291-4752-8477-0baf94f8c42f > checking extents > checking free space cache > cache and super generation don''t match, space cache will be invalidated > checking fs roots > root 256 inode 9579 errors 100 > root 256 inode 9580 errors 100 > root 256 inode 14258 errors 100 > root 256 inode 14259 errors 100 > root 4444 inode 9579 errors 100 > root 4444 inode 9580 errors 100 > root 4444 inode 14258 errors 100 > root 4444 inode 14259 errors 100 > found 2895817096773 bytes used err is 1 > total csum bytes: 3206482672 > total tree bytes: 3901480960 > total fs tree bytes: 38912000 > total extent tree bytes: 135892992 > btree space waste bytes: 411727425 > file data blocks allocated: 3446512275456 > referenced 3445793439744 > Btrfs v0.20-rc1-358-g194aa4 > ------- > > > > > > After the repair, another check reveals the same errors as before: > ------- > ./btrfsck /dev/sdc1 > Checking filesystem on /dev/sdc1 > UUID: 989306aa-d291-4752-8477-0baf94f8c42f > checking extents > checking free space cache > cache and super generation don''t match, space cache will be invalidated > checking fs roots > root 256 inode 9579 errors 100 > root 256 inode 9580 errors 100 > root 256 inode 14258 errors 100 > root 256 inode 14259 errors 100 > root 4444 inode 9579 errors 100 > root 4444 inode 9580 errors 100 > root 4444 inode 14258 errors 100 > root 4444 inode 14259 errors 100 > found 2895817096773 bytes used err is 1 > total csum bytes: 3206482672 > total tree bytes: 3901480960 > total fs tree bytes: 38912000 > total extent tree bytes: 135892992 > btree space waste bytes: 411727425 > file data blocks allocated: 3446512275456 > referenced 3445793439744 > Btrfs v0.20-rc1-358-g194aa4a > ------- > > > The only messages in syslog/dmesg regarding btrfs are: > [299517.270322] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f > devid 2 transid 140436 /dev/sdc1 > [299525.805867] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f > devid 1 transid 140436 /dev/sdb1 > [299525.807148] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f > devid 2 transid 140436 /dev/sdc1 > [299525.808277] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f > devid 1 transid 140436 /dev/sdb1 > (repeating several times) > > > Can we find out, why btrfsck does not fix the errors?I got no reply to this. Now, I have two Intentions: -Help improving btrfs(ck) -Make the System usable again Please let me know, if it is of interest to work with this example on btrfsck, which apparently now is not able to fix this problem and what Information you would need from me. Otherwise, I can proceed to the second point. Greetings, Hendrik --- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hendrik Friedel <hendrik@friedels.name> schrieb:> I re-post this: >[...]>> root 256 inode 9579 errors 100 >> root 256 inode 9580 errors 100 >> root 256 inode 14258 errors 100 >> root 256 inode 14259 errors 100 >> root 4444 inode 9579 errors 100 >> root 4444 inode 9580 errors 100 >> root 4444 inode 14258 errors 100 >> root 4444 inode 14259 errors 100 >> found 2895817096773 bytes used err is 1100 is I_ERR_FILE_EXTENT_DISCOUNT. I''m not sure what kind of problem this indicates but btrfsck does not seem to fix this currently - it just detects it. I''m living with errors 400 (I_ERR_FILE_NBYTES_WRONG) and 2000 (I_ERR_LINK_COUNT_WRONG) and had no problem with that yet. I suppose you can simply ignore it for the time being, ensure you have a working backup and hope the kernel handles it well when it encounters such "broken" inodes. And from what I''ve read in the past btrfs is designed to handle and fix most errors on the fly from within the kernel. So it may just "fix" it when such an inode is modified. Thus, btrfsck is meant just as a tool to fix errors that can''t be handled in kernel space. I may be wrong however, experts on the list could give a more detailed insight. BTW, my first impression was that "errors 400" means something like "400 errors" - but that is just a hex bitmask which shows what errors have been found. So "errors 100" is just _one_ bit set, thus only _one_ error. You can use "btrfs subvolume list" to identify which subvolume 4444 is and maybe recreate it or just delete it if it is disposable. The errors should be gone then. That won''t work for subvolume 256, however, for it being the root subvolume obviously. The last of the quoted errors, by pure guessing, probably indicates a problem with the space cache. But I think you already tried discarding it. Did you run btrfsck right after discarding it without regenerating the space cache? Does it still show that error then? Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kai Krakow posted on Tue, 12 Nov 2013 00:58:59 +0100 as excerpted:> Hendrik Friedel <hendrik@friedels.name> schrieb: > >> I re-post this: >> > [...] >>> root 256 inode 9579 errors 100 >>> root 256 inode 9580 errors 100 >>> root 256 inode 14258 errors 100 >>> root 256 inode 14259 errors 100 >>> root 4444 inode 9579 errors 100 >>> root 4444 inode 9580 errors 100 >>> root 4444 inode 14258 errors 100 >>> root 4444 inode 14259 errors 100 >>> found 2895817096773 bytes used err is 1 > > 100 is I_ERR_FILE_EXTENT_DISCOUNT. I''m not sure what kind of problem > this indicates but btrfsck does not seem to fix this currently - it just > detects it.Interesting...> I''m living with errors 400 (I_ERR_FILE_NBYTES_WRONG) and 2000 > (I_ERR_LINK_COUNT_WRONG) and had no problem with that yet. I suppose you > can simply ignore it for the time being, ensure you have a working > backup and hope the kernel handles it well when it encounters such > "broken" inodes. > > And from what I''ve read in the past btrfs is designed to handle and fix > most errors on the fly from within the kernel. So it may just "fix" it > when such an inode is modified. Thus, btrfsck is meant just as a tool to > fix errors that can''t be handled in kernel space. I may be wrong > however, experts on the list could give a more detailed insight. > > BTW, my first impression was that "errors 400" means something like "400 > errors" - but that is just a hex bitmask which shows what errors have > been found. So "errors 100" is just _one_ bit set, thus only _one_ > error.Same impression here, tho I did wonder at the conveniently even number of errors... Perhaps "errors" should be retermed "error-mask" or some such, to make the meaning clearer?> You can use "btrfs subvolume list" to identify which subvolume 4444 is > and maybe recreate it or just delete it if it is disposable. The errors > should be gone then. That won''t work for subvolume 256, however, for it > being the root subvolume obviously.FWIW, that''s only one set of _four_ errors total, listed twice, once for each subvolume (which here is very likely a snapshot), they apply to. The duplicate inode numbers on each "root" are a clue. So while removing subvolume 4444 would kill the second listing of errors, it wouldn''t change the fact that there''s four errors there; it''d only remove the second, duplicate listing since that snapshot would no longer exist.> The last of the quoted errors, by pure guessing, probably indicates a > problem with the space cache. But I think you already tried discarding > it. Did you run btrfsck right after discarding it without regenerating > the space cache? Does it still show that error then?Is that even possible? According to the wiki, the clear_cache mount option is supposed to clear it, but it doesn''t disable the option, which remains enabled, and regeneration would start immediately. The nospace_cache option should disable it, but I''m not sure if it''s persistent across multiple mount cycles or not. (I know the space_cache option is documented as persistent, and in fact, I never even had to enable it here, that was the kernel default when I first mounted my btrfs filesystems, but I don''t know if nospace_cache toggles the persistence too, or just disables it for that mount.) [In case it''s not clear, I''m simply an admin testing btrfs on my systems too. I''ve been on-list for several months now, but I''m not a dev and have no knowledge of the code itself, only what I''ve read on the wiki and list, and my own experience.] -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan <1i5t5.duncan@cox.net> schrieb:>> 100 is I_ERR_FILE_EXTENT_DISCOUNT. I''m not sure what kind of problem >> this indicates but btrfsck does not seem to fix this currently - it just >> detects it. > > Interesting...I wish it were documented what it technically means and what implications each error could have, so one could infer its severity from it. But I could not find anything, just source code which seems to only detect these errors. Btrfsck looks like its purpose is to fix internal tree structures but not inode errors.>> BTW, my first impression was that "errors 400" means something like "400 >> errors" - but that is just a hex bitmask which shows what errors have >> been found. So "errors 100" is just _one_ bit set, thus only _one_ >> error. > > Same impression here, tho I did wonder at the conveniently even number of > errors... Perhaps "errors" should be retermed "error-mask" or some such, > to make the meaning clearer?Of course the numbers are even because they are powers of two: * error no. 1 is "errors 1" (2^0) * error no. 2 is "errors 2" (2^1) * error no. 3 is "errors 4" (2^2) * error no. 4 is "errors 8" (2^3) * error no. 5 is "errors 10" (...) * error no. 6 is "errors 20" * ... 40, 80, 100, 400, 800, 1000, 2000 If one or more distinct errors are found in an inode, the numbers are simply added (as hex code), so if you don''t have error no. 1, numbers are always even - that''s the nature of a bit mask. If I would happen to have error no. 3, 5, and 6 in an inode, this would result in "errors 34" (0x4 + 0x10 + 0x20).>> You can use "btrfs subvolume list" to identify which subvolume 4444 is >> and maybe recreate it or just delete it if it is disposable. The errors >> should be gone then. That won''t work for subvolume 256, however, for it >> being the root subvolume obviously. > > FWIW, that''s only one set of _four_ errors total, listed twice, once for > each subvolume (which here is very likely a snapshot), they apply to. > The duplicate inode numbers on each "root" are a clue. > > So while removing subvolume 4444 would kill the second listing of errors, > it wouldn''t change the fact that there''s four errors there; it''d only > remove the second, duplicate listing since that snapshot would no longer > exist.Oh, good pointer. That is probably true. I actually did not think of that possibility when I was writing that post.>> The last of the quoted errors, by pure guessing, probably indicates a >> problem with the space cache. But I think you already tried discarding >> it. Did you run btrfsck right after discarding it without regenerating >> the space cache? Does it still show that error then? > > Is that even possible? According to the wiki, the clear_cache mount > option is supposed to clear it, but it doesn''t disable the option, which > remains enabled, and regeneration would start immediately. The > nospace_cache option should disable it, but I''m not sure if it''s > persistent across multiple mount cycles or not. (I know the space_cache > option is documented as persistent, and in fact, I never even had to > enable it here, that was the kernel default when I first mounted my btrfs > filesystems, but I don''t know if nospace_cache toggles the persistence > too, or just disables it for that mount.)Possible? Yes. Although I did not implicitly mention it, you would combine "clear_cache" and "nospace_cache" - that should do the trick. Then unmount and check.> [In case it''s not clear, I''m simply an admin testing btrfs on my systems > too. I''ve been on-list for several months now, but I''m not a dev and > have no knowledge of the code itself, only what I''ve read on the wiki and > list, and my own experience.]I second that. I use btrfs just on my private desktop PC, evaluating it, and I''m quite happy with it. But it still has many bugs which mostly occur during IO stress so I would not trust it any server data yet because servers exposed to internet access are mostly not controllable in the sense of what stress is put at them. But I''m eagerly looking forward to the possibilities it will offer for server systems - some time in the future. Until then, I''m a linux server administrator, maintaining some systems for a hosting company, running XFS on them (which proved almost unbreakable), analysing problematic system behavior and all this stuff, and have some background in kernel-near, low-level, and application programming - the latter being probably the best reason for being on this list. Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kai Krakow posted on Tue, 12 Nov 2013 20:37:57 +0100 as excerpted:>>> BTW, my first impression was that "errors 400" means something like >>> "400 errors" - but that is just a hex bitmask which shows what errors >>> have been found. So "errors 100" is just _one_ bit set, thus only >>> _one_ >>> error. >> >> Same impression here, tho I did wonder at the conveniently even number >> of errors... Perhaps "errors" should be retermed "error-mask" or some >> such, >> to make the meaning clearer? > > Of course the numbers are even because they are powers of two:That''s what I meant: Once I read that they were bit-flags and thus powers of two represented in octal or hex, it made sense. Before that, I had idly/sub-consciously wondered why errors "coincidentally" seemed to always occur in nice round batches of X-hundred, etc, but it hadn''t yet risen to a level of consciousness where I was even aware what it was that seemed odd about it -- that only happened in hindsight once I read the bitflags explanation and realized what had been subconsciously bothering me about the "too round" numbers I was interpreting them as, before. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kai Krakow posted on Tue, 12 Nov 2013 20:37:57 +0100 as excerpted:> Possible? Yes. Although I did not implicitly mention it, you would > combine "clear_cache" and "nospace_cache" - that should do the trick. > Then unmount and check.Thanks and mentally noted for further reference. I didn''t think about combining the options, but it makes perfect sense now that I have, thanks to you. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, >> Possible? Yes. Although I did not implicitly mention it, you would>> combine "clear_cache" and "nospace_cache" - that should do the trick. >> Then unmount and check. > > Thanks and mentally noted for further reference. I didn''t think about > combining the options, but it makes perfect sense now that I have, thanks > to you. =:^)For me, it unfortunately did not work: mount /dev/sdc1 /mnt/BTRFS/Video/VDR -o clear_cache,nospace_cache [wait a day or two] Checking filesystem on /dev/sdc1 UUID: 989306aa-d291-4752-8477-0baf94f8c42f checking extents checking free space cache cache and super generation don''t match, space cache will be invalidated checking fs roots root 256 inode 9579 errors 100, file extent discount root 256 inode 9580 errors 100, file extent discount root 256 inode 14258 errors 100, file extent discount root 256 inode 14259 errors 100, file extent discount root 4444 inode 9579 errors 100, file extent discount root 4444 inode 9580 errors 100, file extent discount root 4444 inode 14258 errors 100, file extent discount root 4444 inode 14259 errors 100, file extent discount found 2928473450130 bytes used err is 1 total csum bytes: 3206482672 total tree bytes: 3902070784 total fs tree bytes: 38912000 total extent tree bytes: 136044544 btree space waste bytes: 411777432 file data blocks allocated: 3447164817408 referenced 3446445981696 Btrfs v0.20-rc1-596-ge9ac73b Same as before (just a bit more verbose). Does it help to delete the files at the affected inodes? How do I find, which files are stored at these inodes? Greetings, Hendrik -- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Mobil 0178 1874363 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html